Data Analysis via Interpolation

Overview, Objectives, and Key Terms

In this lecture, we continue the theme of data analysis initiated in Lecture 24, focusing now on the process of interpolation. Whereas the fitted models of Lecture 24 allow one to produce simple models from noisy (or otherwise complicated) data, interpolation allows one to estimate new values from known discrete points that are assumed to be noise free.

Objectives

By the end of this lesson, you should be able to

  • Apply linear interpolation to evaluate \(f(x)\) given discrete points \((x_i, f(x_i))\) for \(i = 0, 1 \ldots\).
  • Apply linear interpolation to evaluate \(f(x, y)\) given discrete points \((x_i, y_j, f(x_i, y_j))\) for \(\, i = 0, 1, \ldots\) and \(j = 0, 1,\ldots\).
  • Apply tools from NumPy and SciPy to perform high-order (e.g., cubine-spline) interpolation

Key Terms

  • linear interpolation
  • np.interp
  • scipy.interpolate
  • interp1d
  • interp2d

Simple Linear Interpolation

A common task in many thermal engineering courses is to use tabulated physical properties. Often, the value needed is not in the table, and linear interpolation is required to estimate the missing value.

Let the quantity of interest be \(y(x)\), for which values \(y_0, y_1, \ldots\) are provided at \(x_0, x_1, \ldots\). Suppose the value of \(y\) at \(x = \tilde{x}\) is desired. The first step is to determine which two values \(x_i\) and \(x_{i+1}\) contain the point \(\tilde{x}\). This step is accomplished using the linear or binary search algorithms developed in Lecture 14.


Exercise: Apply linear search to find \(i\) such that \(\tilde{x} \in [x_i, x_{i+1}]\) for \(\tilde{x} = 3.5\) and \(x = [0, 1, 2, 3, 4, 5]\).


Exercise: Repeat the exercise using binary search.


Exercise: For the same \(\tilde{x}\) and points \(x\), evaluate np.array(x) < x_tilde and sum(np.array(x) < x_tilde). What can you conclude?


Exercise: Use help to learn about np.searchsorted. Then, for the same \(\tilde{x}\) and points \(x\), evaluate np.searchsorted(x, x_tilde). What can you conclude?


Once \(i\) is determined, \((x_i, y(x_i))\) and \((x_{i+1}, y(x_{i+1}))\) are connected by a line of the form \(ax + b\), and \(y(\tilde{x})\) is evaluated from \(a\tilde{x} + b\). The trick is to determine \(a\) and \(b\), but that’s easy: we have two equations \(y_i = ax_i + b\) and \(y_{i+1} = a x_{i+1} + b\). Just solve for \(a\) and \(b\):

\[a = \frac{y_{i+1}-y_i}{x_{i+1}-x_i} \quad \text{and} \quad b = y_i - \frac{y_{i+1}-y_i}{x_{i+1}-x_i} x_i \, .\]

Exercise: Consider the following table of thermodynamic data for saturated steam (saved as thermo_data.txt):

 T [K]   P [MPa]  h_l [kJ/kg] h_v [kJ/kg]
373.150   0.101    2675.572     419.099
389.817   0.179    2701.054     489.625
406.483   0.298    2724.633     560.634
423.150   0.476    2745.919     632.252
439.817   0.730    2764.528     704.626
456.483   1.082    2780.059     777.934
473.150   1.555    2792.062     852.393
489.817   2.175    2800.004     928.269
506.483   2.972    2803.247    1005.893
523.150   3.976    2801.012    1085.687
539.817   5.222    2792.314    1168.202
556.483   6.745    2775.820    1254.194
573.150   8.588    2749.574    1344.771

Evaluate \(h_l\) at 515 K using linear interpolation.

Solution

In [1]:
# first load the data
import numpy as np
T, P, h_l, h_v = np.loadtxt('thermo_data.txt', unpack=True, skiprows=1)
# find i such that T[i-1] <= 500 <= T[i]
T_tilde = 515
for i in range(len(T)):
    if T[i] > T_tilde:
        break
# compute slope and intercept
a = (h_l[i]-h_l[i-1])/(T[i]-T[i-1])
b = h_l[i-1] - a*T[i-1]
# compute approximation
h_l_tilde = a*T_tilde + b
h_l_tilde
Out[1]:
2802.1048931421374

Exercise: Adapt the solution above to produce a function that returns \(h_l\) at any temperature T. However, add assertions to ensure the user does not request \(h_l\) at temperatures outside the range of the data. In other words, limit the function to interpolation and exclude any extrapolation.


Exercise: Suppose \(\tilde{x}\) is beyond \(x_{n-1}\), the last of the \(n\) values of \(x\) in the tabulated data. How would one linearly extrapolate the data to obtain \(y(\tilde{x})\)?


Exercise: Execute help(np.interp). Use it to interpolate \(x = [1, 2, 3]\) and \(y = [2, 8, 18]\) for \(\tilde{x} = 1.5\) and \(\tilde{x} = 2.5\).

Interpolation In Two Dimensions

Some applications require interpolation of two-dimensional data evaluated at \((x_i, y_j)\) for \(i=0,1,\ldots\) and \(j=0,1,\ldots\). Linear interpolation of such data is based on the model \(f(x, y) \approx ax + by + c\) for \(x \in [x_i, x_{i+1}]\) and \(y \in [y_j, y_{j+1}]\).

The trick now is to find \(i\) and \(j\) for values \(\tilde{x}\) and \(\tilde{y}\) for which \(f(x, y)\) is desired. However, the same ideas applied for one dimension work for this two-dimensional problem. To do so requires an extra step:

  1. Again, first assume that \(i\) and \(j\) are found by an appropriate search.
  2. Then, find \(f(\tilde{x}, y_{j})\) and \(f(\tilde{x}, y_{j+1})\) via two separate interpolations.
  3. Finally, find \(f(\tilde{x}, \tilde{y})\) by linearly interpolating the two values \(f(\tilde{x}, y_{j})\) and \(f(\tilde{x}, y_{j+1})\).

Exercise: Given \(f(1, 1) = 6\), \(f(1, 2) = 9\), \(f(2, 1) = 8\), and \(f(2, 2) = 11\), estimate \(f(4/3, 7/4)\) by linear interpolation.

Solution:

In [2]:
# define the given data
x = [1, 2]
y = [1, 2]
f = np.array([[6, 9],
              [8, 11]])
xtilde = 4/3
ytilde = 7/4
# interpolate to get f(xtilde, 1)
a = (f[1, 0]-f[0, 0])/(x[1] - x[0])
b = f[0, 0] - a*x[0]
f_xt_1 = a*xtilde + b
# interpolate to get f(xtilde, 2)
a = (f[1, 1]-f[0, 1])/(x[1] - x[0])
b = f[0, 1] - a*x[0]
f_xt_2 = a*xtilde + b
# interpolate to get f(xtilde, ytilde)
a = (f_xt_2 - f_xt_1)/(y[1] - y[0])
b = f_xt_1 - a*y[0]
f_xt_yt = a*ytilde + b
f_xt_yt
Out[2]:
8.9166666666666661

Exercise: It turns out the previous exercise produces a perfect value for \(f(4/3, 7/4)\) when \(f(x, y) = 2x + 3y + 1\). Can you recover \(a=2\), \(b=3\), and \(c=1\) from your interpolation process?


Exercise: Suppose \(f(1, 1) = 7\), \(f(1, 2) = 11\), \(f(2, 1)=10\), and \(f(2, 2)=15\). Compute \(f(4/3, 7/4)\).


Exercise: Consider the following table of water densities (\(\rho\)) for different temperatures [K] and pressures [MPa] (from thermodata_2.txt):

 T [K]   P [MPa]  rho [kg/m^3]
293.150   2.000    999.073
293.150   4.000    999.982
293.150   6.000   1000.888
293.150   8.000   1001.791
313.150   2.000    993.053
313.150   4.000    993.924
313.150   6.000    994.790
313.150   8.000    995.653
333.150   2.000    984.037
333.150   4.000    984.903
333.150   6.000    985.766
333.150   8.000    986.625
353.150   2.000    972.650
353.150   4.000    973.538
353.150   6.000    974.422
353.150   8.000    975.301
373.150   2.000    959.242
373.150   4.000    960.172
373.150   6.000    961.097
373.150   8.000    962.017

Use linear interpolation to estimate \(\rho(300, 3)\), \(\rho(330, 5)\), and \(\rho(360, 7)\).


Higher-Order Interpolation

Linear interpolation is conceptually straightforward, but it helps to use tools like np.interp (see the exercises above). However, interpolation that is better than linear is more challenging to implement, and we’ll turn to the scipy.interpolate module and its interp1d and interp2d utilities.

The syntax for interp1d is quite simple: f = interp1d(x, y, kind), where x and y are the points to interpolate, and kind is a str argument equal to 'linear' (default), 'quadratic', 'cubic', among other possibilities. Here, linear leads to linear interpolation equivalent to the method introduced above. Quadratic and cubic interpolation involve defined quadratic curves over sets of three points or cubic functions over sets of four points. Of course, if the underlying data were actually quadratic or cubic, these higher-order interpolation schemes should be expected to do better. The resulting f is a callable function that accepts \(\tilde{x}\), the point at which the data is to be estimated.


Example: Interpolate \(x = [1, 2, 3, 4, 5]\) and \(y = [1, 4, 9, 16, 25]\) for \(\tilde{x} = 4.5\)

Solution:

In [3]:
from scipy.interpolate import interp1d
f_lin = interp1d([1,2,3,4,5], [1,4,9,16,25], kind='linear')
f_quad = interp1d([1,2,3,4,5], [1,4,9,16,25], kind='quadratic')
f_lin(4.5), f_quad(4.5), 4.5**2
Out[3]:
(array(20.5), array(20.25), 20.25)

The syntax for interp2d is equally simple: f = interp2d(x, y, f, kind). Here, x and y are individual arrays, while f is a 2-D array representing \(f(x_i, y_j)\) for \(i = 0, 1, \ldots\) and \(j = 0, 1, \ldots\). Any easy way to evaluate f is to use np.meshgrid. Finally, kind is a str argument that can be 'linear' and'cubic' (among others, but not 'quadratic').


Exercise: Given x = np.linspace(0, 10, 11), y = np.linspace(0, 10, 11), and z = lambda x, y: 1 + x + y + x**2 + y**2 + x**4*y**4 (a decidely nonlinear function), evaluate f(4.5, 6.5) using linea and cubic interpolation.

Solution

In [4]:
x = np.linspace(0, 10, 11)
y = np.linspace(0, 10, 11)
xx, yy  = np.meshgrid(x, y)
z = lambda x, y: 1 + x + y + x**2 + y**2 + x**4*y**4
from scipy.interpolate import interp2d
f_lin = interp2d(x, y, z(xx, yy), kind='linear')
f_cub = interp2d(x, y, z(xx, yy), kind='cubic')
f_lin(4.5, 6.5), f_cub(4.5, 6.5), z(4.5, 6.5)
Out[4]:
(array([ 814339.25]), array([ 731923.8169771]), 732061.69140625)

Further Reading

In this lecture, interpolation has been limited to regular grids of points along the \(x\) and \(y\) axes. When points are more than one dimension are not on such grids, different methods are needed. Barycentric interpolation is one option. In 2-D, points can be organized as triangles, and linear interpolation can be applied within a single triangle. In 3-D, triangles become pyramids, and in higher dimensions, one interpolates over general n-simplices.