Chapter 8 - Fundamentals of Numerical Computation

Functions¶

Power iteration

poweriter.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
"""
    poweriter(A, numiter)

Perform `numiter` power iterations with the matrix `A`, starting
from a random vector. Returns a vector of eigenvalue estimates
and the final eigenvector approximation.
"""
function poweriter(A, numiter)
    n = size(A, 1)
    x = normalize(randn(n), Inf)
    β = zeros(numiter)
    for k in 1:numiter
        y = A * x
        m = argmax(abs.(y))
        β[k] = y[m] / x[m]
        x = y / y[m]
    end
    return β, x
end

Inverse iteration

inviter.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
"""
    inviter(A, s, numiter)

Perform `numiter` inverse iterations with the matrix `A` and shift
`s`, starting from a random vector. Returns a vector of
eigenvalue estimates and the final eigenvector approximation.
"""
function inviter(A, s, numiter)
    n = size(A, 1)
    x = normalize(randn(n), Inf)
    β = zeros(numiter)
    fact = lu(A - s * I)
    for k in 1:numiter
        y = fact \ x
        normy, m = findmax(abs.(y))
        β[k] = x[m] / y[m] + s
        x = y / y[m]
    end
    return β, x
end

Arnoldi iteration

arnoldi.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"""
    arnoldi(A, u, m)

Perform the Arnoldi iteration for `A` starting with vector `u`, out
to the Krylov subspace of degree `m`. Returns the orthonormal basis
(`m`+1 columns) and the upper Hessenberg `H` of size `m`+1 by `m`.
"""
function arnoldi(A, u, m)
    n = length(u)
    Q = zeros(n, m+1)
    H = zeros(m+1, m)
    Q[:, 1] = u / norm(u)
    for j in 1:m
        # Find the new direction that extends the Krylov subspace.
        v = A * Q[:, j]
        # Remove the projections onto the previous vectors.
        for i in 1:j
            H[i, j] = dot(Q[:, i], v)
            v -= H[i, j] * Q[:, i]
        end
        # Normalize and store the new basis vector.
        H[j+1, j] = norm(v)
        Q[:, j+1] = v / H[j+1, j]
    end
    return Q, H
end

GMRES

gmres.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
    gmres(A, b, m)

Do `m` iterations of GMRES for the linear system `A`*x=`b`. Returns
the final solution estimate x and a vector with the history of
residual norms. (This function is for demo only, not practical use.)
"""
function gmres(A, b, m)
    n = length(b)
    Q = zeros(n, m+1)
    Q[:, 1] = b / norm(b)
    H = zeros(m+1, m)

    # Initial solution is zero.
    x = 0
    residual = [norm(b); zeros(m)]

    for j in 1:m
        # Next step of Arnoldi iteration.
        v = A * Q[:, j]
        for i in 1:j
            H[i, j] = dot(Q[:, i], v)
            v -= H[i, j] * Q[:, i]
        end
        H[j+1, j] = norm(v)
        Q[:, j+1] = v / H[j+1, j]

        # Solve the minimum residual problem.
        r = [norm(b); zeros(j)]
        z = H[1:j+1, 1:j] \ r
        x = Q[:, 1:j] * z
        residual[j+1] = norm(A * x - b)
    end
    return x, residual
end

Examples¶

8.1 Sparsity and structure¶

Example 8.1.1

Here we load the adjacency matrix of a graph with 2790 nodes. Each node is a web page referring to Roswell, NM, and the edges represent links between web pages. (Credit goes to Panayiotis Tsaparas and the University of Toronto for making this data public.)

using SparseArrays, JLD2
@load "roswell.jld2" A;      # file is on the book's website

We may define the density of $\mathbf{A}$ as the number of nonzeros divided by the total number of entries.

m, n = size(A)
@show density = nnz(A) / (m * n);

density = nnz(A) / (m * n) = 0.0010902994565845762

The computer memory consumed by any variable can be discovered using summarysize. We can use it to compare the space needed for the sparse representation to its dense counterpart, that is, the space needed to store all the elements, whether zero or not.

F = Matrix(A)
Base.summarysize(F) / Base.summarysize(A)

550.2690513219285

As you can see, the storage savings are dramatic. Matrix-vector products are also much faster using the sparse form because operations with structural zeros are skipped.

x = randn(n)
A * x;   # make sure * is loaded and compiled
@elapsed for i in 1:300
    A * x
end

0.049605833

F * x;
@elapsed for i in 1:300
    F * x
end

14.46557525

Example 8.1.2

Here is the adjacency matrix of a graph representing a small-world network, featuring connections to neighbors and a small number of distant contacts.

using GraphRecipes
@load "smallworld.jld2" A
graphplot(A, linealpha=0.5)

Because each node connects to relatively few others, the adjacency matrix is quite sparse.

spy(A, title="Nonzero locations", m=2, color=:blues)

By Theorem 7.1.1, the entries of $\mathbf{A}^k$ give the number of walks of length $k$ between pairs of nodes, as with “k degrees of separation” within a social network. As $k$ grows, the density of $\mathbf{A}^k$ also grows.

plt = plot(layout=(1, 3), legend=:none, size=(600, 240))
for k in 2:4
    spy!(A^k;
        subplot=k - 1, color=:blues,
        title=latexstring("\\mathbf{A}^$k"))
end
plt

Example 8.1.3

The spdiagm function creates a sparse matrix given its diagonal elements. The main or central diagonal is numbered zero, above and to the right of that is positive, and below and to the left is negative.

n = 50;
A = spdiagm(-3 => fill(n, n - 3),
    0 => ones(n),
    1 => -(1:n-1),
    5 => fill(0.1, n - 5))
Matrix(A[1:7, 1:7])

7×7 Matrix{Float64}:
  1.0  -1.0   0.0   0.0   0.0   0.1   0.0
  0.0   1.0  -2.0   0.0   0.0   0.0   0.1
  0.0   0.0   1.0  -3.0   0.0   0.0   0.0
 50.0   0.0   0.0   1.0  -4.0   0.0   0.0
  0.0  50.0   0.0   0.0   1.0  -5.0   0.0
  0.0   0.0  50.0   0.0   0.0   1.0  -6.0
  0.0   0.0   0.0  50.0   0.0   0.0   1.0

Without pivoting, the LU factors have the same lower and upper bandwidth as the original matrix.

L, U = FNC.lufact(A)
plot(layout=2)
spy!(sparse(L), m=2, subplot=1, title=L"\mathbf{L}", color=:blues)
spy!(sparse(U), m=2, subplot=2, title=L"\mathbf{U}", color=:blues)

However, if we introduce row pivoting, bandedness may be expanded or destroyed.

fact = lu(A)
plot(layout=2)
spy!(sparse(fact.L), m=2, subplot=1, title=L"\mathbf{L}", color=:blues)
spy!(sparse(fact.U), m=2, subplot=2, title=L"\mathbf{U}", color=:blues)

Example 8.1.4

The following generates a random sparse matrix with prescribed eigenvalues.

n = 4000
density = 4e-4
λ = @. 1 + 1 / (1:n)   # exact eigenvalues
A = FNC.sprandsym(n, density, λ);

The eigs function from Arpack finds a small number eigenvalues meeting some criterion. First, we ask for the 5 of largest (complex) magnitude using which=:LM.

using Arpack
λmax, V = eigs(A, nev=5, which=:LM)    # Largest Magnitude
fmt = ft_printf("%20.15f")
pretty_table([λmax λ[1:5]], header=["found", "exact"], formatters=fmt)

┌──────────────────────┬──────────────────────┐
│                found │                exact │
├──────────────────────┼──────────────────────┤
│    2.000000000000002 │    2.000000000000000 │
│    1.500000000000005 │    1.500000000000000 │
│    1.333333333333334 │    1.333333333333333 │
│    1.250000000000001 │    1.250000000000000 │
│    1.199999999999999 │    1.200000000000000 │
└──────────────────────┴──────────────────────┘

Now we find the 5 closest to the value 1 in the complex plane, via sigma=1.

λ1, V = eigs(A, nev=5, sigma=1)    # closest to sigma
data = [λ1 λ[end:-1:end-4]]
pretty_table(data, header=["found", "exact"], formatters=fmt)

┌──────────────────────┬──────────────────────┐
│                found │                exact │
├──────────────────────┼──────────────────────┤
│    1.000250000000000 │    1.000250000000000 │
│    1.000250062515629 │    1.000250062515629 │
│    1.000250125062531 │    1.000250125062531 │
│    1.000250187640731 │    1.000250187640731 │
│    1.000250250250250 │    1.000250250250250 │
└──────────────────────┴──────────────────────┘

The time needed to solve a sparse linear system is not easy to predict unless you have some more information about the matrix. But it will typically be orders of magnitude faster than the dense version of the same problem.

x = @. 1 / (1:n);
b = A * x;

norm(x - A \ b);  # force compilation
t = @elapsed sparse_err = norm(x - A \ b)
println("Time for sparse solve: $t")

Time for sparse solve: 0.276979542

D = Matrix(A)  # convert to regular matrix
norm(x - D \ b);
t = @elapsed dense_err = norm(x - D \ b)
println("Time for dense solve: $t")

Time for dense solve: 3.257475875

@show sparse_err;
@show dense_err;

sparse_err = 5.641517106242322e-17


dense_err = 1.654403346772913e-17

8.2 Power iteration¶

Example 8.2.1

Here we choose a random 5×5 matrix and a random 5-vector.

A = rand(1.0:9.0, 5, 5)
A = A ./ sum(A, dims=1)
x = randn(5)

5-element Vector{Float64}:
 -2.051598006971385
 -2.409741337932436
  0.4319487870679362
  0.6624067614868305
 -0.0899512129133455

Applying matrix-vector multiplication once doesn’t do anything recognizable.

y = A * x

5-element Vector{Float64}:
 -0.555495893170436
 -0.5786853253668668
 -0.6831248064243652
 -1.3818487318911952
 -0.25778025240953634

Repeating the multiplication still doesn’t do anything obvious.

z = A * y

5-element Vector{Float64}:
 -0.8091431080089553
 -0.7562880842369648
 -0.4768281542191334
 -0.5889868825578362
 -0.8256887802395099

But if we keep repeating the matrix-vector multiplication, something remarkable happens: $\mathbf{A} \mathbf{x} \approx \mathbf{x}$ .

for j in 1:8
    x = A * x
end
[x A * x]

5×2 Matrix{Float64}:
 -0.682612  -0.682518
 -0.616908  -0.61682
 -0.552805  -0.552858
 -0.717918  -0.718057
 -0.886691  -0.886682

This phenomenon seems to occur regardless of the starting vector.

x = randn(5)
for j in 1:8
    x = A * x
end
[x A * x]

5×2 Matrix{Float64}:
 -0.161391  -0.161364
 -0.145862  -0.145832
 -0.130689  -0.130708
 -0.169732  -0.169768
 -0.209628  -0.20963

Example 8.2.2

We will experiment with the power iteration on a 5×5 matrix with prescribed eigenvalues and dominant eigenvalue at 1.

λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)

5×5 Matrix{Float64}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We run the power iteration 60 times. The best estimate of the dominant eigenvalue is the last entry of the first output.

β, x = FNC.poweriter(A, 60)
eigval = β[end]

1.000000003344092

We check for linear convergence using a log-linear plot of the error.

using Plots
err = @. 1 - β
plot(0:59, abs.(err); m=:o, 
    xlabel=L"k",  
    yaxis=(L"|\lambda_1-\beta_k|", :log10, [1e-10, 1]),
    title="Convergence of power iteration")

The asymptotic trend seems to be a straight line, consistent with linear convergence. To estimate the convergence rate, we look at the ratio of two consecutive errors in the linear part of the convergence curve. The ratio of the first two eigenvalues should match the observed rate.

@show theory = λ[2] / λ[1];
@show observed = err[40] / err[39];

theory = λ[2] / λ[1] = -0.75
observed = err[40] / err[39] = -0.7520004611727363

Note that the error is supposed to change sign on each iteration. The effect of these alternating signs is that estimates oscillate around the exact value.

β[26:30]

5-element Vector{Float64}:
 1.0000607762580542
 0.9999565697313072
 1.0000338636650483
 0.9999753768143407
 1.0000189320772697

In practical situations, we don’t know the exact eigenvalue that the algorithm is supposed to find. In that case we would base errors on the final β that was found, as in the following plot.

err = @. β[end] - β[1:end-1]
plot(0:58, abs.(err), m=:o, 
    xlabel=L"k", 
    yaxis=(L"|\beta_{60}-\beta_k|", :log10, [1e-10, 1]),
    title="Convergence of power iteration")

The results are very similar until the last few iterations, when the limited accuracy of the reference value begins to show. That is, while it is a good estimate of $\lambda_1$ , it is less good as an estimate of the error in nearby estimates.

8.3 Inverse iteration¶

Example 8.3.1

We set up a $5\times 5$ triangular matrix with prescribed eigenvalues on its diagonal.

λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)

5×5 Matrix{Float64}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We run inverse iteration with the shift $s=0.7$ and take the final estimate as our “exact” answer to observe the convergence.

s = 0.7
β, x = FNC.inviter(A, s, 30)
eigval = β[end]

0.5999999999999984

As expected, the eigenvalue that was found is the one closest to 0.7. The convergence is again linear.

using Plots
err = @. abs(eigval - β)
plot(0:28, err[1:end-1];
    m=:o,  xlabel=L"k", 
    yaxis=(L"|\lambda_3-\beta_k|", :log10, [1e-16, 1]),
    title="Convergence of inverse iteration")

The observed linear convergence rate is found from the data.

@show observed_rate = err[22] / err[21];

observed_rate = err[22] / err[21] = 0.33326844048322757

We reorder the eigenvalues to enforce (8.3.3).

λ = λ[sortperm(abs.(λ .- s))]

5-element Vector{Float64}:
  0.6
  1.0
  0.0
 -0.4
 -0.75

Hence the theoretical convergence rate is

@show theoretical_rate = (λ[1] - s) / (λ[2] - s);

theoretical_rate = (λ[1] - s) / (λ[2] - s) = -0.3333333333333332

Example 8.3.2

λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)

5×5 Matrix{Float64}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We begin with a shift $s=0.7$ , which is closest to the eigenvalue 0.6.

s = 0.7
x = ones(5)
y = (A - s * I) \ x
β = x[1] / y[1] + s

0.7034813925570228

Note that the result is not yet any closer to the targeted 0.6. But we proceed (without being too picky about normalization here).

s = β
x = y / y[1]
y = (A - s * I) \ x
β = x[1] / y[1] + s

0.5612761406172997

Still not much apparent progress. However, in just a few more iterations the results are dramatically better.

for k in 1:4
    s = β
    x = y / y[1]
    y = (A - s * I) \ x
    @show β = x[1] / y[1] + s
end

β = x[1] / y[1] + s = 0.5964312884753865
β = x[1] / y[1] + s = 0.5999717091820104
β = x[1] / y[1] + s = 0.5999999978556353
β = x[1] / y[1] + s = 0.6

8.4 Krylov subspaces¶

Example 8.4.1

First we define a triangular matrix with known eigenvalues, and a random vector $b$ .

λ = @. 10 + (1:100)
A = triu(rand(100, 100), 1) + diagm(λ)
b = rand(100);

Next we build up the first ten Krylov matrices iteratively, using renormalization after each matrix-vector multiplication.

Km = [b zeros(100, 29)]
for m in 1:29
    v = A * Km[:, m]
    Km[:, m+1] = v / norm(v)
end

Now we solve least-squares problems for Krylov matrices of increasing dimension, recording the residual in each case.

resid = zeros(30)
for m in 1:30
    z = (A * Km[:, 1:m]) \ b
    x = Km[:, 1:m] * z
    resid[m] = norm(b - A * x)
end

The linear system approximations show smooth linear convergence at first, but the convergence stagnates after only a few digits have been found.

using Plots
plot(0:29, resid; m=:o,
    xaxis=(L"m"), yaxis=(:log10, L"\| b-Ax_m \|"),
    title="Residual for linear systems", legend=:none)

Example 8.4.2

We illustrate a few steps of the Arnoldi iteration for a small matrix.

A = rand(1.0:9.0, 6, 6)

6×6 Matrix{Float64}:
 9.0  4.0  7.0  4.0  3.0  7.0
 7.0  2.0  7.0  6.0  4.0  3.0
 4.0  8.0  8.0  4.0  3.0  9.0
 4.0  3.0  8.0  8.0  1.0  4.0
 1.0  5.0  1.0  4.0  3.0  7.0
 2.0  4.0  3.0  1.0  4.0  5.0

The seed vector we choose here determines the first member of the orthonormal basis.

u = randn(6)
Q = u / norm(u);

Multiplication by $\mathbf{A}$ gives us a new vector in $\mathcal{K}_2$ .

Aq = A * Q[:, 1];

We subtract off its projection in the previous direction. The remainder is rescaled to give us the next orthonormal column.

v = Aq - dot(Q[:, 1], Aq) * Q[:, 1]
Q = [Q v / norm(v)];

On the next pass, we have to subtract off the projections in two previous directions.

Aq = A * Q[:, 2]
v = Aq - dot(Q[:, 1], Aq) * Q[:, 1] - dot(Q[:, 2], Aq) * Q[:, 2]
Q = [Q v / norm(v)];

At every step, $\mathbf{Q}_m$ is an ONC matrix.

@show opnorm(Q' * Q - I);

opnorm(Q' * Q - I) = 1.2267554763961925e-16

And $\mathbf{Q}_m$ spans the same space as the three-dimensional Krylov matrix.

K = [u A * u A * A * u];
@show rank([Q K]);

rank([Q K]) = 3

8.5 GMRES¶

Example 8.5.1

We define a triangular matrix with known eigenvalues and a random vector $\mathbf{b}$ .

λ = @. 10 + (1:100)
A = triu(rand(100, 100), 1) + diagm(λ)
b = rand(100);

Instead of building the Krylov matrices, we use the Arnoldi iteration to generate equivalent orthonormal vectors.

Q, H = FNC.arnoldi(A, b, 60);

The Arnoldi bases are used to solve the least-squares problems defining the GMRES iterates.

resid = [norm(b); zeros(60)]
for m in 1:60
    s = [norm(b); zeros(m)]
    z = H[1:m+1, 1:m] \ s
    x = Q[:, 1:m] * z
    resid[m+1] = norm(b - A * x)
end

The approximations converge smoothly, practically all the way to machine epsilon.

using Plots
plot(0:60, resid, m=:o,
    xaxis=(L"m"),  yaxis=(:log10, "norm of mth residual"),
    title="Residual for GMRES",  legend=:none)

Example 8.5.2

The following experiments are based on a matrix resulting from discretization of a partial differential equation.

A = FNC.poisson(50)
n = size(A, 1)
b = ones(n);
spy(A, color=:blues)

We compare unrestarted GMRES with three different thresholds for restarting. Here we are using gmres from the IterativeSolvers package, since our simple implementation does not offer restarting.

using IterativeSolvers
reltol = 1e-12;
plt = plot(title="Convergence of restarted GMRES", legend=:bottomleft,
    xaxis=(L"m"),  yaxis=(:log10, "residual norm", [1e-8, 100]))

for restart in [n, 20, 40, 60]
    x, hist = IterativeSolvers.gmres(A, b; restart, reltol,
        maxiter=100, log=true)
    plot!(hist[:resnorm], label="restart = $restart")
end

plt

The “pure” GMRES curve is the lowest one. All of the other curves agree with it until the first restart. Decreasing the restart value makes the convergence per iteration generally worse, but the time required per iteration smaller as well.

8.6 MINRES and conjugate gradients¶

Example 8.6.2

The following matrix is indefinite.

A = FNC.poisson(10) - 20I
λ = eigvals(Matrix(A))
isneg = @. λ < 0
@show sum(isneg), sum(.!isneg);

(sum(isneg), sum(.!(isneg))) = (13, 87)

We can compute the relevant quantities from Theorem 8.6.1.

mn, mx = extrema(-λ[isneg])
κ₋ = mx / mn
mn, mx = extrema(λ[.!isneg])
κ₊ = mx / mn
ρ = (sqrt(κ₋ * κ₊) - 1) / (sqrt(κ₋ * κ₊) + 1)

0.9026418585584018

Because the iteration number $m$ is halved in (8.6.4), the rate constant of linear convergence is the square root of this number, which makes it even closer to 1.

Now we apply MINRES to a linear system with this matrix, and compare the observed convergence to the upper bound from the theorem.

using IterativeSolvers, Plots
b = rand(100)
x, hist = minres(A, b, reltol=1e-10, maxiter=51, log=true);
relres = hist[:resnorm] / norm(b)
m = 0:length(relres)-1
plot(m, relres;
    label="observed", legend=:left,
    xaxis=L"m",  yaxis=(:log10, "relative residual"),
    title=("Convergence of MINRES"))
plot!(m, ρ .^ (m / 2), l=:dash, label="upper bound")

The upper bound turns out to be pessimistic here, especially in the later iterations. However, you can see a slow linear phase in the convergence that tracks the bound closely.

Example 8.6.3

We will compare MINRES and CG on some quasi-random SPD problems. The first matrix has a condition number of 100.

n = 5000
density = 0.001
A = FNC.sprandsym(n, density, 1 / 100)
x = (1:n) / n
b = A * x;

Now we apply both methods and compare the convergence of the system residuals, using implementations imported from IterativeSolvers.

plt = plot(title="Convergence of MINRES and CG",
    xaxis=("Krylov dimension"),  yaxis=(:log10, "relative residual norm"))
for method in [minres, cg]
    x̃, history = method(A, b, reltol=1e-6, maxiter=1000, log=true)
    relres = history[:resnorm] / norm(b)
    plot!(0:length(relres)-1, relres, label="$method")
    err = round(norm(x̃ - x) / norm(x), sigdigits=4)
    println("$method error: $err")
end
plt

There is little difference between the two methods here. Next, we increase the condition number of the matrix by a factor of 25. The rule of thumb predicts that the number of iterations required should increase by a factor of about 5.

A = FNC.sprandsym(n, density, 1 / 2500)
b = A * x;

plt = plot(title="Convergence of MINRES and CG",
    xaxis=("Krylov dimension"), yaxis=(:log10, "relative residual norm"))
for method in [minres, cg]
    x̃, history = method(A, b, reltol=1e-6, maxiter=1000, log=true)
    relres = history[:resnorm] / norm(b)
    plot!(0:length(relres)-1, relres, label="$method")
    err = round(norm(x̃ - x) / norm(x), sigdigits=4)
    println("$method error: $err")
end
plt

Both methods have an early superlinear phase that allow them to finish slightly sooner than the factor of 5 predicted: Theorem 8.6.2 is an upper bound, not necessarily an approximation. Both methods ultimately achieve the same reduction in the residual; MINRES stops earlier, but with a slightly larger error.

8.7 Matrix-free iterations¶

Example 8.7.1

We use a readily available test image.

using Images, TestImages
img = testimage("mandrill")
m, n = size(img)
X = @. Float64(Gray(img))
plot(Gray.(X), title="Original image", aspect_ratio=1)

We define the one-dimensional tridiagonal blurring matrices.

using SparseArrays
function blurmatrix(d)
    v1 = fill(0.25, d - 1)
    return spdiagm(0 => fill(0.5, d), 1 => v1, -1 => v1)
end
B, C = blurmatrix(m), blurmatrix(n);

Finally, we show the results of using $k=12$ repetitions of the blur in each direction.

using Plots
blur = X -> B^12 * X * C^12;
Z = blur(X)
plot(Gray.(Z), title="Blurred image")

Example 8.7.2

We repeat the earlier process to blur an original image $\mathbf{X}$ to get $\mathbf{Z}$ .

img = testimage("lighthouse")
m, n = size(img)
X = @. Float64(Gray(img))

B = spdiagm(0 => fill(0.5, m),
    1 => fill(0.25, m - 1), -1 => fill(0.25, m - 1))
C = spdiagm(0 => fill(0.5, n),
    1 => fill(0.25, n - 1), -1 => fill(0.25, n - 1))
blur = X -> B^12 * X * C^12
Z = blur(X)
plot(Gray.(Z), title="Blurred image")

Now we imagine that $\mathbf{X}$ is unknown and that we want to recover it from $\mathbf{Z}$ . We first need functions that translate between vector and matrix representations.

# vec (built-in) converts matrix to vector
unvec = z -> reshape(z, m, n);  # convert vector to matrix

Now we declare the three-step blur transformation as a LinearMap, supplying also the size of the vector form of an image.

using LinearMaps
T = LinearMap(x -> vec(blur(unvec(x))), m * n);

The blurring operators are symmetric, so we apply minres to the composite blurring transformation T.

using IterativeSolvers
y = minres(T, vec(Z), maxiter=50, reltol=1e-5);
Y = unvec(clamp01.(y))

plot(Gray.(X), layout=2, title="Original")
plot!(Gray.(Y), subplot=2, title="Deblurred")

8.8 Preconditioning¶

Example 8.8.1

Here is an SPD matrix that arises from solving partial differential equations.

using MatrixDepot
A = matrixdepot("wathen", 60)
n = size(A, 1)
@show n, nnz(A);

[ Info: verify download of index files...
[ Info: reading database

EOFError(

┌ Warning: recreating database file
└ @ MatrixDepot ~/.julia/packages/MatrixDepot/4S7Oa/src/download.jl:59
[ Info: reading index files
[ Info: adding metadata...
[ Info: adding svd data...

[ Info: writing database
┌ Warning: exception during initialization: 'KeyError(MatrixDepot)'
└ @ MatrixDepot ~/.julia/packages/MatrixDepot/4S7Oa/src/MatrixDepot.jl:125

(n, nnz(A)) = (11041, 170161)

There is an easy way to use the diagonal elements of $\mathbf{A}$ , or any other vector, as a diagonal preconditioner.

using Preconditioners
b = ones(n)
M = DiagonalPreconditioner(diag(A));

We now compare CG with and without the preconditioner.

using IterativeSolvers, Plots
plain(b) = cg(A, b, maxiter=200, reltol=1e-4, log=true)
time_plain = @elapsed x, hist1 = plain(b)
prec(b) = cg(A, b, Pl=M, maxiter=200, reltol=1e-4, log=true)
time_prec = @elapsed x, hist2 = prec(b)
@show time_plain, time_prec

rr = hist1[:resnorm]
plot(0:length(rr)-1, rr / rr[1], yscale=:log10, label="plain")
rr = hist2[:resnorm]
plot!(0:length(rr)-1, rr / rr[1], yscale=:log10, label="preconditioned")
title!("Diagonal preconditioning in CG")

The diagonal preconditioner cut down substantially on the number of iterations. The effect on the total time is less dramatic, but this is not a large version of the problem.

Example 8.8.2

Here is a nonsymmetric matrix arising from a probabilistic model in computational chemistry.

using SparseArrays
A = sparse(matrixdepot("Watson/chem_master1"))
n = size(A, 1)
@show n, nnz(A), issymmetric(A)

(n, nnz(A), issymmetric(A)) = (40401, 201201, false)

(40401, 201201, false)

Without a preconditioner, GMRES makes essentially no progress after 100 iterations.

b = rand(40000)
const GMRES = IterativeSolvers.gmres
x, history = GMRES(A, b, maxiter=100, reltol=1e-5, log=true)
resnorm = history[:resnorm]
@show resnorm[end] / resnorm[1];

resnorm[end] / resnorm[1] = 0.969059695989308

The following version of incomplete LU factorization drops all sufficiently small values (i.e., replaces them with zeros). This keeps the number of nonzeros in the factors under control.

using IncompleteLU
iLU = ilu(A, τ=0.25)
@show nnz(iLU) / nnz(A);

nnz(iLU) / nnz(A) = 9.654509669435043

The result is almost 10 times as dense as $\mathbf{A}$ and yet still not a true factorization of it. However, it’s close enough for an approximate inverse in a preconditioner. The actual preconditioning matrix is $\mathbf{M}=\mathbf{L}\mathbf{U}$ , but we just supply the factorization to gmres.

_, history = GMRES(A, b, Pl=iLU, maxiter=100, reltol=1e-5, log=true)
history

Converged after 7 iterations.

The τ parameter in ilu balances the accuracy of the iLU factorization with the time needed to compute it and invert it. As $\tau\to 0$ , more of the elements are kept, making the preconditioner more effective but slower per iteration.

plt = plot(0:40, resnorm[1:41] / resnorm[1];
    label="no preconditioning",  legend=:bottomright,
    xaxis=("iteration number"),
    yaxis=(:log10, "residual norm"),
    title="Incomplete LU preconditioning")
for τ in [2, 1, 0.25, 0.1]
    t = @elapsed iLU = ilu(A; τ)
    t += @elapsed _, history = GMRES(A, b, Pl=iLU, maxiter=100,
        reltol=1e-5, log=true)
    resnorm = history[:resnorm]
    label = "τ = $τ, time = $(round(t,digits=3))"
    plot!(0:length(resnorm)-1, resnorm / resnorm[1]; label)
end
plt

In any given problem, it’s impossible to know in advance where the right balance lies between fidelity and speed for the preconditioner.