Fitting functions to data

In Polynomial interpolation we saw how a polynomial can be used to interpolate data—that is, derive a continuous function that evaluates to give a set of prescribed values. But interpolation may not be appropriate in many applications.

In many cases we can get better results by relaxing the interpolation requirement. In the polynomial case this allows us to lower the degree of the polynomial, which limits the number of local max and min points. Let \((t_i,y_i)\) for \(i=1,\ldots,m\) be the given points. We will represent the data by the polynomial

(57)\[y \approx f(t) = c_1 + c_2t + \cdots + c_{n-1} t^{n-2} + c_n t^{n-1},\]

with \(n<m\). Just as in (26), we can express a vector of \(f\)-values by a matrix-vector multiplication:

(58)\[\begin{split}\begin{bmatrix} f(t_1) \\ f(t_2) \\ f(t_3) \\ \vdots \\ f(t_m) \end{bmatrix} = \begin{bmatrix} 1 & t_1 & \cdots & t_1^{n-1} \\ 1 & t_2 & \cdots & t_2^{n-1} \\ 1 & t_3 & \cdots & t_3^{n-1} \\ \vdots & \vdots & & \vdots \\ 1 & t_m & \cdots & t_m^{n-1} \\ \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}.\end{split}\]

Note that \(\mathbf{V}\) has the same structure as the Vandermonde matrix in (26) but is \(m\times n\), thus taller than it is wide. It’s impossible in general to satisfy \(m\) conditions with \(n<m\) variables, so rather than solving a linear system \(\mathbf{V} \mathbf{c} = \mathbf{y}\), we have to find an approximation \(\mathbf{V} \mathbf{c} \approx \mathbf{y}\). Below we specify precisely what is meant by this, but first we note that MATLAB conveniently uses the same backslash notation to solve the problem for both square and rectangular matrices.

The least squares formulation

In the most general terms, our fitting functions take the form

(59)\[f(t) = c_1 f_1(t) + \cdots + c_n f_n(t),\]

where \(f_1,\ldots,f_n\) are all known functions with no undetermined parameters. This leaves only \(c_1,\ldots,c_n\) to be determined. The essential feature of a linear least squares problem is that the fit depends only linearly on the unknown parameters. For instance, a function of the form \(f(t)=c_1 + c_2 e^{c_3 t}\) is not of this type.

At each observation \((t_i,y_i)\), we define a residual, \(y_i - f(t_i)\). A sensible formulation of the fitting criterion is to minimize

\[ R(c_1,\ldots,c_n) = \sum_{i=1}^m\, [ y_i - f(t_i) ]^2,\]

over all possible choices of parameters \(c_1,\ldots,c_n\). We can apply linear algebra to write the problem in the form \(R=\mathbf{r}^T \mathbf{r}\), where

\[\begin{split}\mathbf{r} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\y_{m-1} \\ y_m \end{bmatrix} - \begin{bmatrix} f_1(t_1) & f_2(t_1) & \cdots & f_n(t_1) \\[1mm] f_1(t_2) & f_2(t_2) & \cdots & f_n(t_2) \\[1mm] & \vdots \\ f_1(t_{m-1}) & f_2(t_{m-1}) & \cdots & f_n(t_{m-1}) \\[1mm] f_1(t_m) & f_2(t_m) & \cdots & f_n(t_m) \\[1mm] \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}.\end{split}\]

Recalling that \(\mathbf{r}^T\mathbf{r}=\| \mathbf{r} \|_2^2\), and renaming the variables to standardize the statement, we arrive at the general linear least squares problem:

Definition 21

Given \(\mathbf{A}\in\mathbb{R}^{m \times n}\) and \(\mathbf{b}\in\mathbb{R}^m\), with \(m>n\), find

(60)\[\operatorname{argmin}_{\mathbf{x}\in \mathbb{R}^n} \| \mathbf{b}-\mathbf{A} \mathbf{x} \|_2^2.\]

The notation argmin above means to find an \(\mathbf{x}\) that produces the minimum value.

While we could choose to minimize in any vector norm, the 2-norm is the most common and convenient choice. For the rest of this chapter we exclusively use the 2-norm. An \(\mathbf{x}\) that achieves the minimum residual is considered to be a solution of the problem. In the edge case \(m=n\) for a nonsingular \(\mathbf{A}\), the definitions of the linear least squares and linear systems problems coincide: the solution of \(\mathbf{A}\mathbf{x}=\mathbf{b}\) implies \(\mathbf{r}=\boldsymbol{0}\), which is a global minimum of \(\| \mathbf{r} \|_2^2 \ge 0\).

Change of variables

The most familiar and common case of a polynomial least squares fit is the straight line, \(f(t) = c_1 + c_2 t\). Certain other fit functions can be transformed into this situation. For example, suppose we want to fit data using \(g(t)= a_1 e^{a_2 t}\). Then

(61)\[\log y \approx \log g(t) = (\log a_1) + a_2 t = c_1 + c_2 t.\]

While the fit of the \(y_i\) to \(f(t)\) is nonlinearly dependent on fitting parameters, the fit of \(\log y_i\) to a straight line is a linear problem. Similarly, the power-law relationship \(y\approx f(t)=a_1 t^{a_2}\) is equivalent to

(62)\[\log y \approx (\log a_1) + a_2 (\log t).\]

Thus the variable \(z=\log y\) can be fit linearly in terms of the variable \(s=\log t\). In practice these two cases—exponential fit and power-law—are easily detected by using log-linear or log-log plots, respectively.

Exercises

  1. ✍ Suppose \(f(x)\) is a differentiable nonnegative real function. Use calculus to prove that the local minimizers of \(f(x)\) and \([f(x)]^2\) are the same values of \(x\).

  2. ⌨ Here are counts of the U.S. population from the census performed every ten years, beginning with 1790 and ending with 2010.

    3.929, 5.308, 7.240, 9.638, 12.87, 17.07, 23.19, 31.44, 39.82, 50.19, 62.95, 76.21,
    92.22, 106.0, 122.8, 132.2, 150.7, 179.3, 203.3, 226.5, 248.7, 281.4, 308.7
    

    (a) Find a best-fitting cubic polynomial for the data. Plot the data as points superimposed on a (smooth) graph of the cubic over the full range of time. Label the axes. What does the fit predict for the population in the years 2000,2010,2020?

    **(b)**Repeat (a) but use a fitting function of the form \(y(t)\approx a e^{b t}\).

    **(c)**Look up the actual U.S. population in 2000 and 2010 and compare to the predictions of parts (a) and (b).

  3. ⌨ Define the following data in Julia.

    t = 0:.5:10
    y = tanh.(t)
    

    (a) Fit the data to a cubic polynomial. Plot the data together with the polynomial fit.

    (b) Fit the data to the function \(c_1 + c_2z + c_3z^2 + c_4z^3\), where \(z=t^2/(1+t^2)\). Plot the data together with the fit. What feature of \(z\) makes this fit much better than the original cubic?

  4. ⌨ One series for finding \(\pi\) is

    \[\frac{\pi}{2} = 1 + \frac{1}{3} + \frac{1\cdot 2}{3\cdot5} + \frac{1\cdot 2\cdot 3}{3\cdot 5\cdot 7} + \cdots.\]

    Define \(s_k\) to be the sum of the first \(k\) terms (that is, up to \(j=k=1\)) and let \(e_k=|\pi-2s_k|\). Do a numerical experiment to determine which is a better fit: \(e_k\approx a b^k\), or \(e_k\approx a k^b\). Then determine the parameters in that fit.

  5. ⌨ Kepler found that the orbital period \(\tau\) of a planet depends on its mean distance \(R\) from the sun according to \(\tau=c R^{\alpha}\) for a simple rational number \(\alpha\). Validate Kepler’s result by using a linear least squares fit and the following table.

    Planet

    Distance from sun in Mkm

    Orbital period in days

    Mercury

    57.59

    87.99

    Venus

    108.11

    224.7

    Earth

    149.57

    365.26

    Mars

    227.84

    686.98

    Jupiter

    778.14

    4332.4

    Saturn

    1427

    10759

    Uranus

    2870.3

    30684

    Neptune

    4499.9

    60188

    Pluto

    5909

    90710

  6. ✍ Show that finding a best fit of the form

    \[y(t) \approx \frac{a}{t+b}\]

    can be transformed into a linear fitting problem (with different undetermined coefficients) by rewriting the equation.

  7. ✍ Show how to find the constants \(a\) and \(b\) in a data fitting problem of the form \(y(t)\approx t/(at+b)\) by transforming it into a linear least squares fitting problem.