Julia implementations
Functions¶
Solution of least squares by the normal equations
About the code
The syntax on line 9 is a field reference to extract the matrix we want from the structure returned by cholesky
.
Solution of least squares by QR factorization
QR factorization by Householder reflections
Examples¶
Section 3.1¶
Interpolating temperature data
Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).
year = 1955:5:2000
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
scatter(year, temp, label="data",
xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)
A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
t = @. (year - 1950) / 10
n = length(t)
V = [ t[i]^j for i in 1:n, j in 0:n-1 ]
c = V \ temp
The coefficients in vector c
are used to create a polynomial. Then we create a function that evaluates the polynomial after changing the time variable as we did for the Vandermonde matrix.
If you plot
a function, then the points are chosen automatically to make a smooth curve.
p = Polynomial(c)
f = yr -> p((yr - 1950) / 10)
plot!(f, 1955, 2000, label="interpolant")
As you can see, the interpolant does represent the data, in a sense. However it’s a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting.
Fitting temperature data
Here are the 5-year temperature averages again.
year = 1955:5:2000
t = @. (year-1950)/10
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.
Backslash solves overdetermined linear systems in a least-squares sense.
V = [ t.^0 t ] # Vandermonde-ish matrix
@show size(V)
c = V \ temp
p = Polynomial(c)
f = yr -> p((yr - 1955) / 10)
scatter(year, temp, label="data",
xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)
plot!(f, 1955, 2000, label="linear fit")
If we use a global cubic polynomial, the points are fit more closely.
V = [ t[i]^j for i in 1:length(t), j in 0:3 ]
@show size(V);
Now we solve the new least-squares problem to redefine the fitting polynomial.
The definition of f
above is in terms of p
. When p
is changed, then f
calls the new version.
p = Polynomial( V \ temp )
plot!(f, 1955, 2000, label="cubic fit")
If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
Fitting a power law
a = [1/k^2 for k=1:100]
s = cumsum(a) # cumulative summation
p = @. sqrt(6*s)
scatter(1:100, p, title="Sequence convergence",
xlabel=L"k", ylabel=L"p_k")
This graph suggests that maybe , but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, . By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
ϵ = @. abs(π - p) # error sequence
scatter(1:100, ϵ, title="Convergence of errors",
xaxis=(:log10,L"k"), yaxis=(:log10,"error"))
The straight line on the log-log scale suggests a power-law relationship where , or .
k = 1:100
V = [ k.^0 log.(k) ] # fitting matrix
c = V \ log.(ϵ) # coefficients of linear fit
In terms of the parameters and used above, we have
a, b = exp(c[1]), c[2];
@show b;
It’s tempting to conjecture that the slope asymptotically. Here is how the numerical fit compares to the original convergence curve.
plot!(k, a * k.^b, l=:dash, label="power-law fit")
Section 3.2¶
Instability in the normal equations
Because the functions , , and 1 are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.
The local variable scoping rule for loops applies to comprehensions as well.
t = range(0, 3, 400)
f = [ x->sin(x)^2, x->cos((1+1e-7)*x)^2, x->1. ]
A = [ f(t) for t in t, f in f ]
@show κ = cond(A);
Now we set up an artificial linear least-squares problem with a known exact solution that actually makes the residual zero.
x = [1., 2, 1]
b = A * x;
Using backslash to find the least-squares solution, we get a relative error that is well below κ times machine epsilon.
x_BS = A \ b
@show observed_error = norm(x_BS - x) / norm(x);
@show error_bound = κ * eps();
If we formulate and solve via the normal equations, we get a much larger relative error. With , we may not be left with more than about 2 accurate digits.
N = A' * A
x_NE = N \ (A'*b)
@show observed_err = norm(x_NE - x) / norm(x);
@show digits = -log10(observed_err);
Section 3.3¶
QR factorization
Julia provides access to both the thin and full forms of the QR factorization.
A = rand(1.:9., 6, 4)
@show m,n = size(A);
Here is a standard call:
Q,R = qr(A);
Q
R
If you look carefully, you see that we seemingly got a full but a thin . However, the above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.
To enter the accented character Q̂
, type Q\hat
followed by Tab.
Q̂ = Matrix(Q)
We can test that is an orthogonal matrix:
opnorm(Q' * Q - I)
The thin cannot be an orthogonal matrix, because it is not square, but it is still ONC:
Q̂' * Q̂ - I
Stability of least-squares via QR
We’ll repeat the experiment of Demo 3.2.1, which exposed instability in the normal equations.
t = range(0, 3, 400)
f = [ x->sin(x)^2, x->cos((1+1e-7)*x)^2, x->1. ]
A = [ f(t) for t in t, f in f ]
x = [1., 2, 1]
b = A * x;
The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.
observed_error = norm(FNC.lsqrfact(A,b) - x) / norm(x);
@show observed_error;
@show error_bound = cond(A) * eps();
Section 3.4¶
Householder QR factorization
We will use Householder reflections to produce a QR factorization of a random matrix.
The rand
function can select randomly from within the interval , or from a vector or range that you specify.
A = rand(float(1:9),6,4)
m,n = size(A)
Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).
:::{card}
`I` can stand for an identity matrix of any size, inferred from the context when needed.
z = A[:, 1];
v = normalize(z - norm(z) * [1; zeros(m-1)])
P₁ = I - 2v * v' # reflector
We check that this reflector introduces zeros as it should:
P₁ * z
Now we replace by .
A = P₁ * A
We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.
z = A[2:m, 2]
v = normalize(z - norm(z) * [1; zeros(m-2)])
P₂ = I - 2v * v'
We now apply this reflector to rows 2 and below only.
A[2:m, :] = P₂ * A[2:m, :]
A
We need to iterate the process for the last two columns.
for j in 3:n
z = A[j:m, j]
v = normalize(z - norm(z) * [1; zeros(m-j)])
P = I - 2v * v'
A[j:m, :] = P * A[j:m, :]
end
We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:
R = triu(A)