In Polynomial interpolation we saw how a polynomial can be used to interpolate data—that is, derive a continuous function that evaluates to give a set of prescribed values. But interpolation may not be appropriate in many applications.
Interpolating temperature dataHere are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).
year = 1955:5:2000
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
scatter(year, temp, label="data",
xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)
A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
t = @. (year - 1950) / 10
n = length(t)
V = [ t[i]^j for i in 1:n, j in 0:n-1 ]
c = V \ temp
The coefficients in vector c
are used to create a polynomial. Then we create a function that evaluates the polynomial after changing the time variable as we did for the Vandermonde matrix.
If you plot
a function, then the points are chosen automatically to make a smooth curve.
p = Polynomial(c)
f = yr -> p((yr - 1950) / 10)
plot!(f, 1955, 2000, label="interpolant")
As you can see, the interpolant does represent the data, in a sense. However it’s a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting .
Interpolating temperature dataHere are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).
t = (1955:5:2000)';
y = [ -0.0480; -0.0180; -0.0360; -0.0120; -0.0040;
0.1180; 0.2100; 0.3320; 0.3340; 0.4560 ];
scatter(t, y), axis tight
xlabel('year')
ylabel('anomaly ({\circ}C)')
A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
t = (t - 1950) / 10;
n = length(t);
V = ones(n, 1); % t^0
for j = 1:n-1
V(:, j+1) = t .* V(:,j);
end
c = V \ y; % solve for coefficients
We created the Vandermonde matrix columns in increasing-degree order. Thus, the coefficients in c
also follow that ordering, which is the opposite of what MATLAB uses. We need to flip the coefficients before using them in polyval
.
p = @(year) polyval(c(end:-1:1), (year - 1950) / 10);
hold on
fplot(p, [1955, 2000]) % plot the interpolating function
Interpolating temperature dataHere are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).
year = arange(1955,2005,5)
y = array([ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ])
fig, ax = subplots()
ax.scatter(year, y, color="k", label="data")
xlabel("year")
ylabel("anomaly (degrees C)")
title("World temperature anomaly");
A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
t = (year - 1950) / 10
V = vander(t)
c = solve(V, y)
print(c)
The coefficients in vector c
are used to create a polynomial. Then we create a function that evaluates the polynomial after changing the time variable as we did for the Vandermonde matrix.
p = poly1d(c) # convert to a polynomial
tt = linspace(1955, 2000, 500)
ax.plot(tt, p((tt - 1950) / 10), label="interpolant")
ax.legend();
fig
As you can see, the interpolant does represent the data, in a sense. However it’s a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting .
In many cases we can get better results by relaxing the interpolation requirement. In the polynomial case this allows us to lower the degree of the polynomial, which limits the number of local max and min points. Let ( t i , y i ) (t_i,y_i) ( t i , y i ) for i = 1 , … , m i=1,\ldots,m i = 1 , … , m be the given points. We will represent the data by the polynomial
y ≈ f ( t ) = c 1 + c 2 t + ⋯ + c n − 1 t n − 2 + c n t n − 1 , y \approx f(t) = c_1 + c_2t + \cdots + c_{n-1} t^{n-2} + c_n t^{n-1}, y ≈ f ( t ) = c 1 + c 2 t + ⋯ + c n − 1 t n − 2 + c n t n − 1 , with n < m n<m n < m . Just as in (2.1.3) , we can express a vector of f f f -values by a matrix-vector multiplication. In other words, we seek an approximation
[ y 1 y 2 y 3 ⋮ y m ] ≈ [ f ( t 1 ) f ( t 2 ) f ( t 3 ) ⋮ f ( t m ) ] = [ 1 t 1 ⋯ t 1 n − 1 1 t 2 ⋯ t 2 n − 1 1 t 3 ⋯ t 3 n − 1 ⋮ ⋮ ⋮ 1 t m ⋯ t m n − 1 ] [ c 1 c 2 ⋮ c n ] = V c . \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_m \end{bmatrix} \approx
\begin{bmatrix}
f(t_1) \\
f(t_2) \\
f(t_3) \\
\vdots \\
f(t_m)
\end{bmatrix} =
\begin{bmatrix}
1 & t_1 & \cdots & t_1^{n-1} \\
1 & t_2 & \cdots & t_2^{n-1} \\
1 & t_3 & \cdots & t_3^{n-1} \\
\vdots & \vdots & & \vdots \\
1 & t_m & \cdots & t_m^{n-1} \\
\end{bmatrix}
\begin{bmatrix}
c_1 \\
c_2 \\
\vdots \\
c_n
\end{bmatrix}
= \mathbf{V} \mathbf{c}. ⎣ ⎡ y 1 y 2 y 3 ⋮ y m ⎦ ⎤ ≈ ⎣ ⎡ f ( t 1 ) f ( t 2 ) f ( t 3 ) ⋮ f ( t m ) ⎦ ⎤ = ⎣ ⎡ 1 1 1 ⋮ 1 t 1 t 2 t 3 ⋮ t m ⋯ ⋯ ⋯ ⋯ t 1 n − 1 t 2 n − 1 t 3 n − 1 ⋮ t m n − 1 ⎦ ⎤ ⎣ ⎡ c 1 c 2 ⋮ c n ⎦ ⎤ = Vc . Note that V \mathbf{V} V has the same structure as the Vandermonde matrix in (2.1.3) but is m × n m\times n m × n , thus taller than it is wide. It’s impossible in general to satisfy m m m conditions with n < m n<m n < m variables, and we say the system is overdetermined . Rather than solving the system exactly, we have to find a best approximation. Below we specify precisely what is meant by this, but first we note that Julia uses the same backslash notation to solve the problem in both the square and overdetermined cases.
Fitting temperature dataHere are the 5-year temperature averages again.
year = 1955:5:2000
t = @. (year-1950)/10
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.
Backslash solves overdetermined linear systems in a least-squares sense.
V = [ t.^0 t ] # Vandermonde-ish matrix
@show size(V)
c = V \ temp
p = Polynomial(c)
f = yr -> p((yr - 1955) / 10)
scatter(year, temp, label="data",
xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)
plot!(f, 1955, 2000, label="linear fit")
If we use a global cubic polynomial, the points are fit more closely.
V = [ t[i]^j for i in 1:length(t), j in 0:3 ]
@show size(V);
Now we solve the new least-squares problem to redefine the fitting polynomial.
The definition of f
above is in terms of p
. When p
is changed, then f
calls the new version.
p = Polynomial( V \ temp )
plot!(f, 1955, 2000, label="cubic fit")
If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
Fitting temperature dataHere are the 5-year temperature averages again.
year = (1955:5:2000)';
y = [ -0.0480; -0.0180; -0.0360; -0.0120; -0.0040;
0.1180; 0.2100; 0.3320; 0.3340; 0.4560 ];
The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.
Backslash solves overdetermined linear systems in a least-squares sense.
t = (year - 1955) / 10; % better matrix conditioning later
V = [ t.^0 t ]; % Vandermonde-ish matrix
size(V)
c = V \ y;
f = @(year) polyval(c(end:-1:1), (year - 1955) / 10);
clf
scatter(year, y), axis tight
xlabel('year'), ylabel('anomaly ({\circ}C)')
hold on
fplot(f, [1955, 2000])
If we use a global cubic polynomial, the points are fit more closely.
V = [t.^0, t.^1, t.^2, t.^3]; % Vandermonde-ish matrix
size(V)
Now we solve the new least-squares problem to redefine the fitting polynomial.
The definition of f
above is in terms of c
. When c
is changed, then f
has to be redefined.
c = V \ y;
f = @(year) polyval(c(end:-1:1), (year - 1955) / 10);
fplot(f, [1955, 2000])
legend('data', 'linear', 'cubic', 'Location', 'northwest')
If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
Fitting temperature dataHere are the 5-year temperature averages again.
year = arange(1955, 2005, 5)
y = array([-0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560])
t = (year - 1950) / 10
The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.
Backslash solves overdetermined linear systems in a least-squares sense.
V = array([ [t[i], 1] for i in range(t.size) ]) # Vandermonde-ish matrix
print(V.shape)
from numpy.linalg import lstsq
c, res, rank, sv = lstsq(V, y)
p = poly1d(c)
f = lambda year: p((year - 1950) / 10)
```{code-cell}
fig, ax = subplots()
ax.scatter(year, y, color="k", label="data")
yr = linspace(1955, 2000, 500)
ax.plot(yr, f(yr), label="linear fit")
xlabel("year")
ylabel("anomaly (degrees C)")
title("World temperature anomaly");
ax.legend();
If we use a global cubic polynomial, the points are fit more closely.
V = array([ [t[i]**3,t[i]**2,t[i],1] for i in range(t.size) ]) # Vandermonde-ish matrix
print(V.shape)
Now we solve the new least-squares problem to redefine the fitting polynomial.
The definition of f
above is in terms of c
. When c
is changed, f
is updated with it.
c, res, rank, sv = lstsq(V, y, rcond=None)
yr = linspace(1955, 2000, 500)
ax.plot(yr, f(yr), label="cubic fit")
fig
If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
In the most general terms, our fitting functions take the form
f ( t ) = c 1 f 1 ( t ) + ⋯ + c n f n ( t ) f(t) = c_1 f_1(t) + \cdots + c_n f_n(t) f ( t ) = c 1 f 1 ( t ) + ⋯ + c n f n ( t ) where f 1 , … , f n f_1,\ldots,f_n f 1 , … , f n are all known functions with no undetermined parameters. This leaves only c 1 , … , c n c_1,\ldots,c_n c 1 , … , c n to be determined. The essential feature of a linear least-squares problem is that the fit depends only linearly on the unknown parameters. For instance, a function of the form f ( t ) = c 1 + c 2 e c 3 t f(t)=c_1 + c_2 e^{c_3 t} f ( t ) = c 1 + c 2 e c 3 t is not of this type.
At each observation ( t i , y i ) (t_i,y_i) ( t i , y i ) , we define a residual, y i − f ( t i ) y_i - f(t_i) y i − f ( t i ) . A sensible formulation of the fitting criterion is to minimize
R ( c 1 , … , c n ) = ∑ i = 1 m [ y i − f ( t i ) ] 2 , R(c_1,\ldots,c_n) = \sum_{i=1}^m\, [ y_i - f(t_i) ]^2, R ( c 1 , … , c n ) = i = 1 ∑ m [ y i − f ( t i ) ] 2 , over all possible choices of parameters c 1 , … , c n c_1,\ldots,c_n c 1 , … , c n . We can apply linear algebra to write the problem in the form R = r T r R=\mathbf{r}^T \mathbf{r} R = r T r , where
r = [ y 1 y 2 ⋮ y m − 1 y m ] − [ f 1 ( t 1 ) f 2 ( t 1 ) ⋯ f n ( t 1 ) f 1 ( t 2 ) f 2 ( t 2 ) ⋯ f n ( t 2 ) ⋮ f 1 ( t m − 1 ) f 2 ( t m − 1 ) ⋯ f n ( t m − 1 ) f 1 ( t m ) f 2 ( t m ) ⋯ f n ( t m ) ] [ c 1 c 2 ⋮ c n ] . \mathbf{r} =
\begin{bmatrix}
y_1 \\ y_2 \\ \vdots \\y_{m-1} \\ y_m
\end{bmatrix} -
\begin{bmatrix}
f_1(t_1) & f_2(t_1) & \cdots & f_n(t_1) \\[1mm]
f_1(t_2) & f_2(t_2) & \cdots & f_n(t_2) \\[1mm]
& \vdots \\
f_1(t_{m-1}) & f_2(t_{m-1}) & \cdots & f_n(t_{m-1}) \\[1mm]
f_1(t_m) & f_2(t_m) & \cdots & f_n(t_m) \\[1mm]
\end{bmatrix}
\begin{bmatrix}
c_1 \\ c_2 \\ \vdots \\ c_n
\end{bmatrix}. r = ⎣ ⎡ y 1 y 2 ⋮ y m − 1 y m ⎦ ⎤ − ⎣ ⎡ f 1 ( t 1 ) f 1 ( t 2 ) f 1 ( t m − 1 ) f 1 ( t m ) f 2 ( t 1 ) f 2 ( t 2 ) ⋮ f 2 ( t m − 1 ) f 2 ( t m ) ⋯ ⋯ ⋯ ⋯ f n ( t 1 ) f n ( t 2 ) f n ( t m − 1 ) f n ( t m ) ⎦ ⎤ ⎣ ⎡ c 1 c 2 ⋮ c n ⎦ ⎤ . Recalling that r T r = ∥ r ∥ 2 2 \mathbf{r}^T\mathbf{r}=\| \mathbf{r} \|_2^2 r T r = ∥ r ∥ 2 2 , and renaming the variables to standardize the statement, we arrive at the general linear least-squares problem .
The notation argmin above means to find an x \mathbf{x} x that produces the minimum value.
While we could choose to minimize in any vector norm, the 2-norm is the most common and convenient choice. For the rest of this chapter we exclusively use the 2-norm. In the edge case m = n m=n m = n for a nonsingular A \mathbf{A} A , the definitions of the linear least-squares and linear systems problems coincide: the solution of A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax = b implies r = 0 \mathbf{r}=\boldsymbol{0} r = 0 , which is a global minimum of ∥ r ∥ 2 2 ≥ 0 \| \mathbf{r} \|_2^2 \ge 0 ∥ r ∥ 2 2 ≥ 0 .
3.1.2 Change of variables ¶ The most familiar and common case of a polynomial least-squares fit is the straight line, f ( t ) = c 1 + c 2 t f(t) = c_1 + c_2 t f ( t ) = c 1 + c 2 t . Certain other fit functions can be transformed into this situation. For example, suppose we want to fit data using g ( t ) = a 1 e a 2 t g(t)= a_1 e^{a_2 t} g ( t ) = a 1 e a 2 t . Then
log y ≈ log g ( t ) = ( log a 1 ) + a 2 t = c 1 + c 2 t . \log y \approx \log g(t) = (\log a_1) + a_2 t = c_1 + c_2 t. log y ≈ log g ( t ) = ( log a 1 ) + a 2 t = c 1 + c 2 t . While the fit of the y i y_i y i to g ( t ) g(t) g ( t ) is nonlinearly dependent on fitting parameters, the fit of log y \log y log y to a straight line is a linear problem. Similarly, the power-law relationship y ≈ f ( t ) = a 1 t a 2 y\approx f(t)=a_1 t^{a_2} y ≈ f ( t ) = a 1 t a 2 is equivalent to
log y ≈ ( log a 1 ) + a 2 ( log t ) . \log y \approx (\log a_1) + a_2 (\log t). log y ≈ ( log a 1 ) + a 2 ( log t ) . Thus, the variable z = log y z=\log y z = log y can be fit linearly in terms of the variable s = log t s=\log t s = log t . In practice these two cases—exponential fit and power law—are easily detected by using log-linear or log-log plots, respectively.
Finding numerical approximations to π has fascinated people for millennia. One famous formula is
π 2 6 = 1 + 1 2 2 + 1 3 2 + ⋯ . \displaystyle \frac{\pi^2}{6} = 1 + \frac{1}{2^2} + \frac{1}{3^2} + \cdots. 6 π 2 = 1 + 2 2 1 + 3 2 1 + ⋯ . Say s k s_k s k is the sum of the first k k k terms of the series above, and p k = 6 s k p_k = \sqrt{6s_k} p k = 6 s k . Here is a fancy way to compute these sequences in a compact code.
Fitting a power lawa = [1/k^2 for k=1:100]
s = cumsum(a) # cumulative summation
p = @. sqrt(6*s)
scatter(1:100, p, title="Sequence convergence",
xlabel=L"k", ylabel=L"p_k")
This graph suggests that maybe p k → π p_k\to \pi p k → π , but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, ϵ k = ∣ π − p k ∣ \epsilon_k= |\pi-p_k| ϵ k = ∣ π − p k ∣ . By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
ϵ = @. abs(π - p) # error sequence
scatter(1:100, ϵ, title="Convergence of errors",
xaxis=(:log10,L"k"), yaxis=(:log10,"error"))
The straight line on the log-log scale suggests a power-law relationship where ϵ k ≈ a k b \epsilon_k\approx a k^b ϵ k ≈ a k b , or log ϵ k ≈ b ( log k ) + log a \log \epsilon_k \approx b (\log k) + \log a log ϵ k ≈ b ( log k ) + log a .
k = 1:100
V = [ k.^0 log.(k) ] # fitting matrix
c = V \ log.(ϵ) # coefficients of linear fit
In terms of the parameters a a a and b b b used above, we have
a, b = exp(c[1]), c[2];
@show b;
It’s tempting to conjecture that the slope b → − 1 b\to -1 b → − 1 asymptotically. Here is how the numerical fit compares to the original convergence curve.
plot!(k, a * k.^b, l=:dash, label="power-law fit")
Fitting a power lawk = (1:100)';
a = 1./k.^2; % sequence
s = cumsum(a); % cumulative summation
p = sqrt(6*s);
clf
plot(k, p, 'o-')
xlabel('k'), ylabel('p_k')
title('Sequence converging to \pi')
This graph suggests that maybe p k → π p_k\to \pi p k → π , but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, ϵ k = ∣ π − p k ∣ \epsilon_k= |\pi-p_k| ϵ k = ∣ π − p k ∣ . By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
ep = abs(pi - p); % error sequence
loglog(k, ep, 'o')
title('Convergence')
xlabel('k'), ylabel('|p_k - \pi|'), axis tight
The straight line on the log-log scale suggests a power-law relationship where ϵ k ≈ a k b \epsilon_k\approx a k^b ϵ k ≈ a k b , or log ϵ k ≈ b ( log k ) + log a \log \epsilon_k \approx b (\log k) + \log a log ϵ k ≈ b ( log k ) + log a .
V = [ k.^0, log(k) ]; % fitting matrix
c = V \ log(ep) % coefficients of linear fit
In terms of the parameters a a a and b b b used above, we have
It’s tempting to conjecture that the slope b → − 1 b\to -1 b → − 1 asymptotically. Here is how the numerical fit compares to the original convergence curve.
hold on
loglog(k, a * k.^b)
legend('sequence', 'power-law fit')
Fitting a power lawa = array([1 / (k+1)**2 for k in range(100)])
s = cumsum(a) # cumulative summation
p = sqrt(6*s)
plot(range(100), p, "o")
xlabel("$k$")
ylabel("$p_k$")
title("Sequence convergence");
This graph suggests that maybe p k → π p_k\to \pi p k → π , but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, ϵ k = ∣ π − p k ∣ \epsilon_k= |\pi-p_k| ϵ k = ∣ π − p k ∣ . By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
ep = abs(pi - p) # error sequence
loglog(range(100), ep, "o")
xlabel("$k$")
ylabel("error")
title("Sequence convergence");
The straight line on the log-log scale suggests a power-law relationship where ϵ k ≈ a k b \epsilon_k\approx a k^b ϵ k ≈ a k b , or log ϵ k ≈ b ( log k ) + log a \log \epsilon_k \approx b (\log k) + \log a log ϵ k ≈ b ( log k ) + log a .
V = array([ [1, log(k+1)] for k in range(100) ]) # fitting matrix
c = lstsq(V, log(ep), rcond=None)[0] # coefficients of linear fit
print(c)
In terms of the parameters a a a and b b b used above, we have
a, b = exp(c[0]), c[1]
print(f"b: {b:.3f}")
It’s tempting to conjecture that the slope b → − 1 b\to -1 b → − 1 asymptotically. Here is how the numerical fit compares to the original convergence curve.
loglog(range(100), ep, "o", label="sequence")
k = arange(1,100)
plot(k, a*k**b, "--", label="power fit")
xlabel("$k$"); ylabel("error");
legend(); title("Sequence convergence");
3.1.3 Exercises ¶ ✍ Suppose f f f is a twice-differentiable, nonnegative real function. Show that if there is an x ∗ x^* x ∗ such that f ′ ( x ∗ ) = 0 f'(x^*)=0 f ′ ( x ∗ ) = 0 and f ′ ′ ( x ∗ ) > 0 f''(x^*)>0 f ′′ ( x ∗ ) > 0 , then x ∗ x^* x ∗ is a local minimizer of the function [ f ( x ) ] 2 [f(x)]^2 [ f ( x ) ] 2 .
⌨ Here are counts of the U.S. population in millions from the census performed every ten years, beginning with 1790 and ending with 2010.
3.929, 5.308, 7.240, 9.638, 12.87, 17.07, 23.19, 31.44, 39.82, 50.19, 62.95, 76.21,
92.22, 106.0, 122.8, 132.2, 150.7, 179.3, 203.3, 226.5, 248.7, 281.4, 308.7
(a) Find a best-fitting cubic polynomial for the data. Plot the data as points superimposed on a (smooth) graph of the cubic over the full range of time. Label the axes. What does the fit predict for the population in the years 2000, 2010, and 2020?
(b) Look up the actual U.S. population in 2000, 2010, and 2020 and compare to the predictions of part (a).
⌨ The following are weekly box office earnings (in dollars) in the U.S. for the 2012 film The Hunger Games . (Source: boxofficemojo.com .)
189_932_838, 79_406_327, 46_230_374, 26_830_921, 18_804_290,
13_822_248, 7_474_688, 6_129_424, 4_377_675, 3_764_963, 2_426_574,
1_713_298, 1_426_102, 1_031_985, 694_947, 518_242, 460_578, 317_909
(Note that Julia lets you use _
where you would normally put a comma in a long number.) Fit these values to a function of the form y ( t ) ≈ a e b t y(t)\approx a e^{b t} y ( t ) ≈ a e b t . Plot the data together with the fit using standard linear scales on the axes, and then plot them again using a log scale on the vertical axis.
⌨ In this problem you are trying to find an approximation to the periodic function g ( t ) = e sin ( t − 1 ) g(t)=e^{\sin(t-1)} g ( t ) = e s i n ( t − 1 ) over one period, 0 < t ≤ 2 π 0 < t \le 2\pi 0 < t ≤ 2 π . As data, define
t i = 2 π i 60 , y i = g ( t i ) , i = 1 , … , 60. t_i = \frac{2\pi i}{60}, \; y_i = g(t_i), \quad i=1,\ldots,60. t i = 60 2 πi , y i = g ( t i ) , i = 1 , … , 60. (a) Find the coefficients of the least-squares fit
y ( t ) ≈ c 1 + c 2 t + ⋯ + c 7 t 6 . y(t) \approx c_1 + c_2t + \cdots + c_7 t^6. y ( t ) ≈ c 1 + c 2 t + ⋯ + c 7 t 6 . Superimpose a plot of the data values as points with a curve showing the fit.
(b) Find the coefficients of the least-squares fit
y ≈ d 1 + d 2 cos ( t ) + d 3 sin ( t ) + d 4 cos ( 2 t ) + d 5 sin ( 2 t ) . y \approx d_1 + d_2\cos(t) + d_3\sin(t) + d_4\cos(2t) + d_5\sin(2t). y ≈ d 1 + d 2 cos ( t ) + d 3 sin ( t ) + d 4 cos ( 2 t ) + d 5 sin ( 2 t ) . Unlike part (a), this fitting function is itself periodic. Superimpose a plot of the data values as points with a curve showing the fit.
⌨ Define the following data in Julia.
(a) Fit the data to a cubic polynomial. Plot the data together with the polynomial fit over the interval 0 ≤ t ≤ 10 0 \le t \le 10 0 ≤ t ≤ 10 .
(b) Fit the data to the function c 1 + c 2 z + c 3 z 2 + c 4 z 3 c_1 + c_2z + c_3z^2 + c_4z^3 c 1 + c 2 z + c 3 z 2 + c 4 z 3 , where z = t 2 / ( 1 + t 2 ) z=t^2/(1+t^2) z = t 2 / ( 1 + t 2 ) . Plot the data together with the fit. What feature of z z z makes this fit much better than the original cubic?
⌨ One series for finding π is
π 2 = 1 + 1 3 + 1 ⋅ 2 3 ⋅ 5 + 1 ⋅ 2 ⋅ 3 3 ⋅ 5 ⋅ 7 + ⋯ . \frac{\pi}{2} = 1 + \frac{1}{3} + \frac{1\cdot 2}{3\cdot5} + \frac{1\cdot 2\cdot 3}{3\cdot 5\cdot 7} + \cdots. 2 π = 1 + 3 1 + 3 ⋅ 5 1 ⋅ 2 + 3 ⋅ 5 ⋅ 7 1 ⋅ 2 ⋅ 3 + ⋯ . Define s k s_k s k to be the sum of the first k k k terms on the right-hand side, and let e k = ∣ s k − π / 2 ∣ e_k=|s_k-\pi/2| e k = ∣ s k − π /2∣ .
(a) Calculate e k e_k e k for k = 1 , … , 20 k=1,\ldots,20 k = 1 , … , 20 , and plot the sequence on a log-linear scale.
(b) Determine a a a and b b b in a least-squares fit e k ≈ a ⋅ b k e_k \approx a \cdot b^k e k ≈ a ⋅ b k , and superimpose the fit on the plot from part (a).
⌨ Kepler found that the orbital period τ of a planet depends on its mean distance R R R from the sun according to τ = c R α \tau=c R^{\alpha} τ = c R α for a simple rational number α. Perform a linear least-squares fit from the following table in order to determine the most likely simple rational value of α.
Planet Distance from sun in Mkm Orbital period in days Mercury 57.59 87.99 Venus 108.11 224.7 Earth 149.57 365.26 Mars 227.84 686.98 Jupiter 778.14 4332.4 Saturn 1427 10759 Uranus 2870.3 30684 Neptune 4499.9 60188
✍ Show that finding a fit of the form
y ( t ) ≈ a t + b y(t) \approx \frac{a}{t+b} y ( t ) ≈ t + b a can be transformed into a linear fitting problem (with different undetermined coefficients) by rewriting the equation.
✍ Show how to find the constants a a a and b b b in a data fitting problem of the form y ( t ) ≈ t / ( a t + b ) y(t)\approx t/(at+b) y ( t ) ≈ t / ( a t + b ) for t > 1 t>1 t > 1 by transforming it into a linear least-squares fitting problem.