Skip to article frontmatterSkip to article content

Chapter 3

Julia implementations

Functions

Solution of least squares by the normal equations
Solution of least squares by QR factorization
QR factorization by Householder reflections

Examples

Section 3.1

Interpolating temperature data

Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).

year = 1955:5:2000
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
       0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
    
scatter(year, temp, label="data",
    xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)

A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.

t = @. (year - 1950) / 10
n = length(t)
V = [ t[i]^j for i in 1:n, j in 0:n-1 ]
c = V \ temp

The coefficients in vector c are used to create a polynomial. Then we create a function that evaluates the polynomial after changing the time variable as we did for the Vandermonde matrix.

If you plot a function, then the points are chosen automatically to make a smooth curve.

p = Polynomial(c)
f = yr -> p((yr - 1950) / 10)
plot!(f, 1955, 2000, label="interpolant")

As you can see, the interpolant does represent the data, in a sense. However it’s a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting.

Fitting temperature data

Here are the 5-year temperature averages again.

year = 1955:5:2000
t = @. (year-1950)/10
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
          0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]

The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.

Backslash solves overdetermined linear systems in a least-squares sense.

V = [ t.^0 t ]    # Vandermonde-ish matrix
@show size(V)
c = V \ temp
p = Polynomial(c)
f = yr -> p((yr - 1955) / 10)
scatter(year, temp, label="data",
    xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)
plot!(f, 1955, 2000, label="linear fit")

If we use a global cubic polynomial, the points are fit more closely.

V = [ t[i]^j for i in 1:length(t), j in 0:3 ]   
@show size(V);

Now we solve the new least-squares problem to redefine the fitting polynomial.

The definition of f above is in terms of p. When p is changed, then f calls the new version.

p = Polynomial( V \ temp )
plot!(f, 1955, 2000, label="cubic fit")

If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.

Fitting a power law
a = [1/k^2 for k=1:100] 
s = cumsum(a)        # cumulative summation
p = @. sqrt(6*s)

scatter(1:100, p, title="Sequence convergence",
    xlabel=L"k", ylabel=L"p_k")

This graph suggests that maybe pkπp_k\to \pi, but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, ϵk=πpk\epsilon_k= |\pi-p_k|. By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.

ϵ = @. abs(π - p)    # error sequence
scatter(1:100, ϵ, title="Convergence of errors",
    xaxis=(:log10,L"k"), yaxis=(:log10,"error"))

The straight line on the log-log scale suggests a power-law relationship where ϵkakb\epsilon_k\approx a k^b, or logϵkb(logk)+loga\log \epsilon_k \approx b (\log k) + \log a.

k = 1:100
V = [ k.^0 log.(k) ]     # fitting matrix
c = V \ log.(ϵ)          # coefficients of linear fit

In terms of the parameters aa and bb used above, we have

a, b = exp(c[1]), c[2];
@show b;

It’s tempting to conjecture that the slope b1b\to -1 asymptotically. Here is how the numerical fit compares to the original convergence curve.

plot!(k, a * k.^b, l=:dash, label="power-law fit")

Section 3.2

Instability in the normal equations

Because the functions sin2(t)\sin^2(t), cos2(t)\cos^2(t), and 1 are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.

The local variable scoping rule for loops applies to comprehensions as well.

t = range(0, 3, 400)
f = [ x->sin(x)^2, x->cos((1+1e-7)*x)^2, x->1. ]
A = [ f(t) for t in t, f in f ]
@show κ = cond(A);

Now we set up an artificial linear least-squares problem with a known exact solution that actually makes the residual zero.

x = [1., 2, 1]
b = A * x;

Using backslash to find the least-squares solution, we get a relative error that is well below κ times machine epsilon.

x_BS = A \ b
@show observed_error = norm(x_BS - x) / norm(x);
@show error_bound = κ * eps();

If we formulate and solve via the normal equations, we get a much larger relative error. With κ21014\kappa^2\approx 10^{14}, we may not be left with more than about 2 accurate digits.

N = A' * A
x_NE = N \ (A'*b)
@show observed_err = norm(x_NE - x) / norm(x);
@show digits = -log10(observed_err);

Section 3.3

QR factorization

Julia provides access to both the thin and full forms of the QR factorization.

A = rand(1.:9., 6, 4)
@show m,n = size(A);

Here is a standard call:

Q,R = qr(A);
Q
R

If you look carefully, you see that we seemingly got a full Q\mathbf{Q} but a thin R\mathbf{R}. However, the Q\mathbf{Q} above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.

To enter the accented character , type Q\hat followed by Tab.

Q̂ = Matrix(Q)

We can test that Q\mathbf{Q} is an orthogonal matrix:

opnorm(Q' * Q - I)

The thin Q^\hat{\mathbf{Q}} cannot be an orthogonal matrix, because it is not square, but it is still ONC:

Q̂' * Q̂ - I
Stability of least-squares via QR

We’ll repeat the experiment of Demo 3.2.1, which exposed instability in the normal equations.

t = range(0, 3, 400)
f = [ x->sin(x)^2, x->cos((1+1e-7)*x)^2, x->1. ]
A = [ f(t) for t in t, f in f ]
x = [1., 2, 1]
b = A * x;

The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.

observed_error = norm(FNC.lsqrfact(A,b) - x) / norm(x);
@show observed_error;
@show error_bound = cond(A) * eps();

Section 3.4

Householder QR factorization

We will use Householder reflections to produce a QR factorization of a random matrix.

The rand function can select randomly from within the interval [0,1][0,1], or from a vector or range that you specify.

A = rand(float(1:9),6,4)
m,n = size(A)

Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).

:::{card}
`I` can stand for an identity matrix of any size, inferred from the context when needed.
z = A[:, 1];
v = normalize(z - norm(z) * [1; zeros(m-1)])
P₁ = I - 2v * v'   # reflector

We check that this reflector introduces zeros as it should:

P₁ * z

Now we replace A\mathbf{A} by PA\mathbf{P}\mathbf{A}.

A = P₁ * A

We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.

z = A[2:m, 2]
v = normalize(z - norm(z) * [1; zeros(m-2)])
P₂ = I - 2v * v'

We now apply this reflector to rows 2 and below only.

A[2:m, :] = P₂ * A[2:m, :]
A

We need to iterate the process for the last two columns.

for j in 3:n
    z = A[j:m, j]
    v = normalize(z - norm(z) * [1; zeros(m-j)])
    P = I - 2v * v'
    A[j:m, :] = P * A[j:m, :]
end

We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:

R = triu(A)