Functions¶
Power iteration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
""" poweriter(A, numiter) Perform `numiter` power iterations with the matrix `A`, starting from a random vector. Returns a vector of eigenvalue estimates and the final eigenvector approximation. """ function poweriter(A, numiter) n = size(A, 1) x = normalize(randn(n), Inf) β = zeros(numiter) for k in 1:numiter y = A * x m = argmax(abs.(y)) β[k] = y[m] / x[m] x = y / y[m] end return β, x end
Inverse iteration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
""" inviter(A, s, numiter) Perform `numiter` inverse iterations with the matrix `A` and shift `s`, starting from a random vector. Returns a vector of eigenvalue estimates and the final eigenvector approximation. """ function inviter(A, s, numiter) n = size(A, 1) x = normalize(randn(n), Inf) β = zeros(numiter) fact = lu(A - s * I) for k in 1:numiter y = fact \ x normy, m = findmax(abs.(y)) β[k] = x[m] / y[m] + s x = y / y[m] end return β, x end
Arnoldi iteration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
""" arnoldi(A, u, m) Perform the Arnoldi iteration for `A` starting with vector `u`, out to the Krylov subspace of degree `m`. Returns the orthonormal basis (`m`+1 columns) and the upper Hessenberg `H` of size `m`+1 by `m`. """ function arnoldi(A, u, m) n = length(u) Q = zeros(n, m+1) H = zeros(m+1, m) Q[:, 1] = u / norm(u) for j in 1:m # Find the new direction that extends the Krylov subspace. v = A * Q[:, j] # Remove the projections onto the previous vectors. for i in 1:j H[i, j] = dot(Q[:, i], v) v -= H[i, j] * Q[:, i] end # Normalize and store the new basis vector. H[j+1, j] = norm(v) Q[:, j+1] = v / H[j+1, j] end return Q, H end
GMRES
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
""" gmres(A, b, m) Do `m` iterations of GMRES for the linear system `A`*x=`b`. Returns the final solution estimate x and a vector with the history of residual norms. (This function is for demo only, not practical use.) """ function gmres(A, b, m) n = length(b) Q = zeros(n, m+1) Q[:, 1] = b / norm(b) H = zeros(m+1, m) # Initial solution is zero. x = 0 residual = [norm(b); zeros(m)] for j in 1:m # Next step of Arnoldi iteration. v = A * Q[:, j] for i in 1:j H[i, j] = dot(Q[:, i], v) v -= H[i, j] * Q[:, i] end H[j+1, j] = norm(v) Q[:, j+1] = v / H[j+1, j] # Solve the minimum residual problem. r = [norm(b); zeros(j)] z = H[1:j+1, 1:j] \ r x = Q[:, 1:j] * z residual[j+1] = norm(A * x - b) end return x, residual end
Examples¶
8.1 Sparsity and structure¶
Example 8.1.1
Here we load the adjacency matrix of a graph with 2790 nodes. Each node is a web page referring to Roswell, NM, and the edges represent links between web pages. (Credit goes to Panayiotis Tsaparas and the University of Toronto for making this data public.)
using SparseArrays, JLD2
@load "roswell.jld2" A; # file is on the book's website
We may define the density of as the number of nonzeros divided by the total number of entries.
Tip
Use nnz
to count the number of nonzeros in a sparse matrix.
m, n = size(A)
@show density = nnz(A) / (m * n);
density = nnz(A) / (m * n) = 0.0010902994565845762
The computer memory consumed by any variable can be discovered using summarysize
. We can use it to compare the space needed for the sparse representation to its dense counterpart, that is, the space needed to store all the elements, whether zero or not.
F = Matrix(A)
Base.summarysize(F) / Base.summarysize(A)
550.2690513219285
As you can see, the storage savings are dramatic. Matrix-vector products are also much faster using the sparse form because operations with structural zeros are skipped.
x = randn(n)
A * x; # make sure * is loaded and compiled
@elapsed for i in 1:300
A * x
end
0.049605833
F * x;
@elapsed for i in 1:300
F * x
end
14.46557525
Example 8.1.2
Here is the adjacency matrix of a graph representing a small-world network, featuring connections to neighbors and a small number of distant contacts.
using GraphRecipes
@load "smallworld.jld2" A
graphplot(A, linealpha=0.5)
Because each node connects to relatively few others, the adjacency matrix is quite sparse.
spy(A, title="Nonzero locations", m=2, color=:blues)
By Theorem 7.1.1, the entries of give the number of walks of length between pairs of nodes, as with “k degrees of separation” within a social network. As grows, the density of also grows.
plt = plot(layout=(1, 3), legend=:none, size=(600, 240))
for k in 2:4
spy!(A^k;
subplot=k - 1, color=:blues,
title=latexstring("\\mathbf{A}^$k"))
end
plt
Example 8.1.3
The spdiagm
function creates a sparse matrix given its diagonal elements. The main or central diagonal is numbered zero, above and to the right of that is positive, and below and to the left is negative.
n = 50;
A = spdiagm(-3 => fill(n, n - 3),
0 => ones(n),
1 => -(1:n-1),
5 => fill(0.1, n - 5))
Matrix(A[1:7, 1:7])
7×7 Matrix{Float64}:
1.0 -1.0 0.0 0.0 0.0 0.1 0.0
0.0 1.0 -2.0 0.0 0.0 0.0 0.1
0.0 0.0 1.0 -3.0 0.0 0.0 0.0
50.0 0.0 0.0 1.0 -4.0 0.0 0.0
0.0 50.0 0.0 0.0 1.0 -5.0 0.0
0.0 0.0 50.0 0.0 0.0 1.0 -6.0
0.0 0.0 0.0 50.0 0.0 0.0 1.0
Without pivoting, the LU factors have the same lower and upper bandwidth as the original matrix.
Tip
The sparse
function converts any matrix to sparse form. But it’s usually better to construct a sparse matrix directly, as the standard form might not fit in memory.
L, U = FNC.lufact(A)
plot(layout=2)
spy!(sparse(L), m=2, subplot=1, title=L"\mathbf{L}", color=:blues)
spy!(sparse(U), m=2, subplot=2, title=L"\mathbf{U}", color=:blues)
However, if we introduce row pivoting, bandedness may be expanded or destroyed.
fact = lu(A)
plot(layout=2)
spy!(sparse(fact.L), m=2, subplot=1, title=L"\mathbf{L}", color=:blues)
spy!(sparse(fact.U), m=2, subplot=2, title=L"\mathbf{U}", color=:blues)
Example 8.1.4
The following generates a random sparse matrix with prescribed eigenvalues.
n = 4000
density = 4e-4
λ = @. 1 + 1 / (1:n) # exact eigenvalues
A = FNC.sprandsym(n, density, λ);
The eigs
function from Arpack
finds a small number eigenvalues meeting some criterion. First, we ask for the 5 of largest (complex) magnitude using which=:LM
.
using Arpack
λmax, V = eigs(A, nev=5, which=:LM) # Largest Magnitude
fmt = ft_printf("%20.15f")
pretty_table([λmax λ[1:5]], header=["found", "exact"], formatters=fmt)
┌──────────────────────┬──────────────────────┐
│ found │ exact │
├──────────────────────┼──────────────────────┤
│ 2.000000000000002 │ 2.000000000000000 │
│ 1.500000000000005 │ 1.500000000000000 │
│ 1.333333333333334 │ 1.333333333333333 │
│ 1.250000000000001 │ 1.250000000000000 │
│ 1.199999999999999 │ 1.200000000000000 │
└──────────────────────┴──────────────────────┘
Now we find the 5 closest to the value 1 in the complex plane, via sigma=1
.
λ1, V = eigs(A, nev=5, sigma=1) # closest to sigma
data = [λ1 λ[end:-1:end-4]]
pretty_table(data, header=["found", "exact"], formatters=fmt)
┌──────────────────────┬──────────────────────┐
│ found │ exact │
├──────────────────────┼──────────────────────┤
│ 1.000250000000000 │ 1.000250000000000 │
│ 1.000250062515629 │ 1.000250062515629 │
│ 1.000250125062531 │ 1.000250125062531 │
│ 1.000250187640731 │ 1.000250187640731 │
│ 1.000250250250250 │ 1.000250250250250 │
└──────────────────────┴──────────────────────┘
The time needed to solve a sparse linear system is not easy to predict unless you have some more information about the matrix. But it will typically be orders of magnitude faster than the dense version of the same problem.
x = @. 1 / (1:n);
b = A * x;
norm(x - A \ b); # force compilation
t = @elapsed sparse_err = norm(x - A \ b)
println("Time for sparse solve: $t")
Time for sparse solve: 0.276979542
D = Matrix(A) # convert to regular matrix
norm(x - D \ b);
t = @elapsed dense_err = norm(x - D \ b)
println("Time for dense solve: $t")
Time for dense solve: 3.257475875
@show sparse_err;
@show dense_err;
sparse_err = 5.641517106242322e-17
dense_err = 1.654403346772913e-17
8.2 Power iteration¶
Example 8.2.1
Here we choose a random 5×5 matrix and a random 5-vector.
A = rand(1.0:9.0, 5, 5)
A = A ./ sum(A, dims=1)
x = randn(5)
5-element Vector{Float64}:
-2.051598006971385
-2.409741337932436
0.4319487870679362
0.6624067614868305
-0.0899512129133455
Applying matrix-vector multiplication once doesn’t do anything recognizable.
y = A * x
5-element Vector{Float64}:
-0.555495893170436
-0.5786853253668668
-0.6831248064243652
-1.3818487318911952
-0.25778025240953634
Repeating the multiplication still doesn’t do anything obvious.
z = A * y
5-element Vector{Float64}:
-0.8091431080089553
-0.7562880842369648
-0.4768281542191334
-0.5889868825578362
-0.8256887802395099
But if we keep repeating the matrix-vector multiplication, something remarkable happens: .
for j in 1:8
x = A * x
end
[x A * x]
5×2 Matrix{Float64}:
-0.682612 -0.682518
-0.616908 -0.61682
-0.552805 -0.552858
-0.717918 -0.718057
-0.886691 -0.886682
This phenomenon seems to occur regardless of the starting vector.
x = randn(5)
for j in 1:8
x = A * x
end
[x A * x]
5×2 Matrix{Float64}:
-0.161391 -0.161364
-0.145862 -0.145832
-0.130689 -0.130708
-0.169732 -0.169768
-0.209628 -0.20963
Example 8.2.2
We will experiment with the power iteration on a 5×5 matrix with prescribed eigenvalues and dominant eigenvalue at 1.
λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)
5×5 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0
0.0 -0.75 1.0 1.0 1.0
0.0 0.0 0.6 1.0 1.0
0.0 0.0 0.0 -0.4 1.0
0.0 0.0 0.0 0.0 0.0
We run the power iteration 60 times. The best estimate of the dominant eigenvalue is the last entry of the first output.
β, x = FNC.poweriter(A, 60)
eigval = β[end]
1.000000003344092
We check for linear convergence using a log-linear plot of the error.
using Plots
err = @. 1 - β
plot(0:59, abs.(err); m=:o,
xlabel=L"k",
yaxis=(L"|\lambda_1-\beta_k|", :log10, [1e-10, 1]),
title="Convergence of power iteration")
The asymptotic trend seems to be a straight line, consistent with linear convergence. To estimate the convergence rate, we look at the ratio of two consecutive errors in the linear part of the convergence curve. The ratio of the first two eigenvalues should match the observed rate.
@show theory = λ[2] / λ[1];
@show observed = err[40] / err[39];
theory = λ[2] / λ[1] = -0.75
observed = err[40] / err[39] = -0.7520004611727363
Note that the error is supposed to change sign on each iteration. The effect of these alternating signs is that estimates oscillate around the exact value.
β[26:30]
5-element Vector{Float64}:
1.0000607762580542
0.9999565697313072
1.0000338636650483
0.9999753768143407
1.0000189320772697
In practical situations, we don’t know the exact eigenvalue that the algorithm is supposed to find. In that case we would base errors on the final β that was found, as in the following plot.
err = @. β[end] - β[1:end-1]
plot(0:58, abs.(err), m=:o,
xlabel=L"k",
yaxis=(L"|\beta_{60}-\beta_k|", :log10, [1e-10, 1]),
title="Convergence of power iteration")
The results are very similar until the last few iterations, when the limited accuracy of the reference value begins to show. That is, while it is a good estimate of , it is less good as an estimate of the error in nearby estimates.
8.3 Inverse iteration¶
Example 8.3.1
We set up a triangular matrix with prescribed eigenvalues on its diagonal.
λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)
5×5 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0
0.0 -0.75 1.0 1.0 1.0
0.0 0.0 0.6 1.0 1.0
0.0 0.0 0.0 -0.4 1.0
0.0 0.0 0.0 0.0 0.0
We run inverse iteration with the shift and take the final estimate as our “exact” answer to observe the convergence.
s = 0.7
β, x = FNC.inviter(A, s, 30)
eigval = β[end]
0.5999999999999984
As expected, the eigenvalue that was found is the one closest to 0.7. The convergence is again linear.
using Plots
err = @. abs(eigval - β)
plot(0:28, err[1:end-1];
m=:o, xlabel=L"k",
yaxis=(L"|\lambda_3-\beta_k|", :log10, [1e-16, 1]),
title="Convergence of inverse iteration")
The observed linear convergence rate is found from the data.
@show observed_rate = err[22] / err[21];
observed_rate = err[22] / err[21] = 0.33326844048322757
We reorder the eigenvalues to enforce (8.3.3).
Tip
The sortperm
function returns the index permutation needed to sort the given vector, rather than the sorted vector itself.
λ = λ[sortperm(abs.(λ .- s))]
5-element Vector{Float64}:
0.6
1.0
0.0
-0.4
-0.75
Hence the theoretical convergence rate is
@show theoretical_rate = (λ[1] - s) / (λ[2] - s);
theoretical_rate = (λ[1] - s) / (λ[2] - s) = -0.3333333333333332
Example 8.3.2
λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)
5×5 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0
0.0 -0.75 1.0 1.0 1.0
0.0 0.0 0.6 1.0 1.0
0.0 0.0 0.0 -0.4 1.0
0.0 0.0 0.0 0.0 0.0
We begin with a shift , which is closest to the eigenvalue 0.6.
s = 0.7
x = ones(5)
y = (A - s * I) \ x
β = x[1] / y[1] + s
0.7034813925570228
Note that the result is not yet any closer to the targeted 0.6. But we proceed (without being too picky about normalization here).
s = β
x = y / y[1]
y = (A - s * I) \ x
β = x[1] / y[1] + s
0.5612761406172997
Still not much apparent progress. However, in just a few more iterations the results are dramatically better.
for k in 1:4
s = β
x = y / y[1]
y = (A - s * I) \ x
@show β = x[1] / y[1] + s
end
β = x[1] / y[1] + s = 0.5964312884753865
β = x[1] / y[1] + s = 0.5999717091820104
β = x[1] / y[1] + s = 0.5999999978556353
β = x[1] / y[1] + s = 0.6
8.4 Krylov subspaces¶
Example 8.4.1
First we define a triangular matrix with known eigenvalues, and a random vector .
λ = @. 10 + (1:100)
A = triu(rand(100, 100), 1) + diagm(λ)
b = rand(100);
Next we build up the first ten Krylov matrices iteratively, using renormalization after each matrix-vector multiplication.
Km = [b zeros(100, 29)]
for m in 1:29
v = A * Km[:, m]
Km[:, m+1] = v / norm(v)
end
Now we solve least-squares problems for Krylov matrices of increasing dimension, recording the residual in each case.
resid = zeros(30)
for m in 1:30
z = (A * Km[:, 1:m]) \ b
x = Km[:, 1:m] * z
resid[m] = norm(b - A * x)
end
The linear system approximations show smooth linear convergence at first, but the convergence stagnates after only a few digits have been found.
using Plots
plot(0:29, resid; m=:o,
xaxis=(L"m"), yaxis=(:log10, L"\| b-Ax_m \|"),
title="Residual for linear systems", legend=:none)
Example 8.4.2
We illustrate a few steps of the Arnoldi iteration for a small matrix.
A = rand(1.0:9.0, 6, 6)
6×6 Matrix{Float64}:
9.0 4.0 7.0 4.0 3.0 7.0
7.0 2.0 7.0 6.0 4.0 3.0
4.0 8.0 8.0 4.0 3.0 9.0
4.0 3.0 8.0 8.0 1.0 4.0
1.0 5.0 1.0 4.0 3.0 7.0
2.0 4.0 3.0 1.0 4.0 5.0
The seed vector we choose here determines the first member of the orthonormal basis.
u = randn(6)
Q = u / norm(u);
Multiplication by gives us a new vector in .
Aq = A * Q[:, 1];
We subtract off its projection in the previous direction. The remainder is rescaled to give us the next orthonormal column.
v = Aq - dot(Q[:, 1], Aq) * Q[:, 1]
Q = [Q v / norm(v)];
On the next pass, we have to subtract off the projections in two previous directions.
Aq = A * Q[:, 2]
v = Aq - dot(Q[:, 1], Aq) * Q[:, 1] - dot(Q[:, 2], Aq) * Q[:, 2]
Q = [Q v / norm(v)];
At every step, is an ONC matrix.
@show opnorm(Q' * Q - I);
opnorm(Q' * Q - I) = 1.2267554763961925e-16
And spans the same space as the three-dimensional Krylov matrix.
K = [u A * u A * A * u];
@show rank([Q K]);
rank([Q K]) = 3
8.5 GMRES¶
Example 8.5.1
We define a triangular matrix with known eigenvalues and a random vector .
λ = @. 10 + (1:100)
A = triu(rand(100, 100), 1) + diagm(λ)
b = rand(100);
Instead of building the Krylov matrices, we use the Arnoldi iteration to generate equivalent orthonormal vectors.
Q, H = FNC.arnoldi(A, b, 60);
The Arnoldi bases are used to solve the least-squares problems defining the GMRES iterates.
resid = [norm(b); zeros(60)]
for m in 1:60
s = [norm(b); zeros(m)]
z = H[1:m+1, 1:m] \ s
x = Q[:, 1:m] * z
resid[m+1] = norm(b - A * x)
end
The approximations converge smoothly, practically all the way to machine epsilon.
using Plots
plot(0:60, resid, m=:o,
xaxis=(L"m"), yaxis=(:log10, "norm of mth residual"),
title="Residual for GMRES", legend=:none)
Example 8.5.2
The following experiments are based on a matrix resulting from discretization of a partial differential equation.
A = FNC.poisson(50)
n = size(A, 1)
b = ones(n);
spy(A, color=:blues)
We compare unrestarted GMRES with three different thresholds for restarting. Here we are using gmres
from the IterativeSolvers
package, since our simple implementation does not offer restarting.
Tip
The syntax f(x;foo)
is shorthand for f(x,foo=foo)
.
using IterativeSolvers
reltol = 1e-12;
plt = plot(title="Convergence of restarted GMRES", legend=:bottomleft,
xaxis=(L"m"), yaxis=(:log10, "residual norm", [1e-8, 100]))
for restart in [n, 20, 40, 60]
x, hist = IterativeSolvers.gmres(A, b; restart, reltol,
maxiter=100, log=true)
plot!(hist[:resnorm], label="restart = $restart")
end
plt
The “pure” GMRES curve is the lowest one. All of the other curves agree with it until the first restart. Decreasing the restart value makes the convergence per iteration generally worse, but the time required per iteration smaller as well.
8.6 MINRES and conjugate gradients¶
Example 8.6.2
The following matrix is indefinite.
A = FNC.poisson(10) - 20I
λ = eigvals(Matrix(A))
isneg = @. λ < 0
@show sum(isneg), sum(.!isneg);
(sum(isneg), sum(.!(isneg))) = (13, 87)
We can compute the relevant quantities from Theorem 8.6.1.
mn, mx = extrema(-λ[isneg])
κ₋ = mx / mn
mn, mx = extrema(λ[.!isneg])
κ₊ = mx / mn
ρ = (sqrt(κ₋ * κ₊) - 1) / (sqrt(κ₋ * κ₊) + 1)
0.9026418585584018
Because the iteration number is halved in (8.6.4), the rate constant of linear convergence is the square root of this number, which makes it even closer to 1.
Now we apply MINRES to a linear system with this matrix, and compare the observed convergence to the upper bound from the theorem.
using IterativeSolvers, Plots
b = rand(100)
x, hist = minres(A, b, reltol=1e-10, maxiter=51, log=true);
relres = hist[:resnorm] / norm(b)
m = 0:length(relres)-1
plot(m, relres;
label="observed", legend=:left,
xaxis=L"m", yaxis=(:log10, "relative residual"),
title=("Convergence of MINRES"))
plot!(m, ρ .^ (m / 2), l=:dash, label="upper bound")
The upper bound turns out to be pessimistic here, especially in the later iterations. However, you can see a slow linear phase in the convergence that tracks the bound closely.
Example 8.6.3
We will compare MINRES and CG on some quasi-random SPD problems. The first matrix has a condition number of 100.
n = 5000
density = 0.001
A = FNC.sprandsym(n, density, 1 / 100)
x = (1:n) / n
b = A * x;
Now we apply both methods and compare the convergence of the system residuals, using implementations imported from IterativeSolvers
.
plt = plot(title="Convergence of MINRES and CG",
xaxis=("Krylov dimension"), yaxis=(:log10, "relative residual norm"))
for method in [minres, cg]
x̃, history = method(A, b, reltol=1e-6, maxiter=1000, log=true)
relres = history[:resnorm] / norm(b)
plot!(0:length(relres)-1, relres, label="$method")
err = round(norm(x̃ - x) / norm(x), sigdigits=4)
println("$method error: $err")
end
plt
There is little difference between the two methods here. Next, we increase the condition number of the matrix by a factor of 25. The rule of thumb predicts that the number of iterations required should increase by a factor of about 5.
A = FNC.sprandsym(n, density, 1 / 2500)
b = A * x;
plt = plot(title="Convergence of MINRES and CG",
xaxis=("Krylov dimension"), yaxis=(:log10, "relative residual norm"))
for method in [minres, cg]
x̃, history = method(A, b, reltol=1e-6, maxiter=1000, log=true)
relres = history[:resnorm] / norm(b)
plot!(0:length(relres)-1, relres, label="$method")
err = round(norm(x̃ - x) / norm(x), sigdigits=4)
println("$method error: $err")
end
plt
Both methods have an early superlinear phase that allow them to finish slightly sooner than the factor of 5 predicted: Theorem 8.6.2 is an upper bound, not necessarily an approximation. Both methods ultimately achieve the same reduction in the residual; MINRES stops earlier, but with a slightly larger error.
8.7 Matrix-free iterations¶
Example 8.7.1
We use a readily available test image.
using Images, TestImages
img = testimage("mandrill")
m, n = size(img)
X = @. Float64(Gray(img))
plot(Gray.(X), title="Original image", aspect_ratio=1)
We define the one-dimensional tridiagonal blurring matrices.
using SparseArrays
function blurmatrix(d)
v1 = fill(0.25, d - 1)
return spdiagm(0 => fill(0.5, d), 1 => v1, -1 => v1)
end
B, C = blurmatrix(m), blurmatrix(n);
Finally, we show the results of using repetitions of the blur in each direction.
using Plots
blur = X -> B^12 * X * C^12;
Z = blur(X)
plot(Gray.(Z), title="Blurred image")
Example 8.7.2
We repeat the earlier process to blur an original image to get .
img = testimage("lighthouse")
m, n = size(img)
X = @. Float64(Gray(img))
B = spdiagm(0 => fill(0.5, m),
1 => fill(0.25, m - 1), -1 => fill(0.25, m - 1))
C = spdiagm(0 => fill(0.5, n),
1 => fill(0.25, n - 1), -1 => fill(0.25, n - 1))
blur = X -> B^12 * X * C^12
Z = blur(X)
plot(Gray.(Z), title="Blurred image")
Now we imagine that is unknown and that we want to recover it from . We first need functions that translate between vector and matrix representations.
# vec (built-in) converts matrix to vector
unvec = z -> reshape(z, m, n); # convert vector to matrix
Now we declare the three-step blur transformation as a LinearMap
, supplying also the size of the vector form of an image.
using LinearMaps
T = LinearMap(x -> vec(blur(unvec(x))), m * n);
The blurring operators are symmetric, so we apply minres
to the composite blurring transformation T
.
Tip
The function clamp01
in Images
restricts values to be in the interval .
using IterativeSolvers
y = minres(T, vec(Z), maxiter=50, reltol=1e-5);
Y = unvec(clamp01.(y))
plot(Gray.(X), layout=2, title="Original")
plot!(Gray.(Y), subplot=2, title="Deblurred")
8.8 Preconditioning¶
Example 8.8.1
Here is an SPD matrix that arises from solving partial differential equations.
using MatrixDepot
A = matrixdepot("wathen", 60)
n = size(A, 1)
@show n, nnz(A);
[ Info: verify download of index files...
[ Info: reading database
EOFError(
)
┌ Warning: recreating database file
└ @ MatrixDepot ~/.julia/packages/MatrixDepot/4S7Oa/src/download.jl:59
[ Info: reading index files
[ Info: adding metadata...
[ Info: adding svd data...
[ Info: writing database
┌ Warning: exception during initialization: 'KeyError(MatrixDepot)'
└ @ MatrixDepot ~/.julia/packages/MatrixDepot/4S7Oa/src/MatrixDepot.jl:125
(n, nnz(A)) = (11041, 170161)
There is an easy way to use the diagonal elements of , or any other vector, as a diagonal preconditioner.
using Preconditioners
b = ones(n)
M = DiagonalPreconditioner(diag(A));
We now compare CG with and without the preconditioner.
using IterativeSolvers, Plots
plain(b) = cg(A, b, maxiter=200, reltol=1e-4, log=true)
time_plain = @elapsed x, hist1 = plain(b)
prec(b) = cg(A, b, Pl=M, maxiter=200, reltol=1e-4, log=true)
time_prec = @elapsed x, hist2 = prec(b)
@show time_plain, time_prec
rr = hist1[:resnorm]
plot(0:length(rr)-1, rr / rr[1], yscale=:log10, label="plain")
rr = hist2[:resnorm]
plot!(0:length(rr)-1, rr / rr[1], yscale=:log10, label="preconditioned")
title!("Diagonal preconditioning in CG")
The diagonal preconditioner cut down substantially on the number of iterations. The effect on the total time is less dramatic, but this is not a large version of the problem.
Example 8.8.2
Here is a nonsymmetric matrix arising from a probabilistic model in computational chemistry.
using SparseArrays
A = sparse(matrixdepot("Watson/chem_master1"))
n = size(A, 1)
@show n, nnz(A), issymmetric(A)
(n, nnz(A), issymmetric(A)) = (40401, 201201, false)
(40401, 201201, false)
Without a preconditioner, GMRES makes essentially no progress after 100 iterations.
b = rand(40000)
const GMRES = IterativeSolvers.gmres
x, history = GMRES(A, b, maxiter=100, reltol=1e-5, log=true)
resnorm = history[:resnorm]
@show resnorm[end] / resnorm[1];
resnorm[end] / resnorm[1] = 0.969059695989308
The following version of incomplete LU factorization drops all sufficiently small values (i.e., replaces them with zeros). This keeps the number of nonzeros in the factors under control.
using IncompleteLU
iLU = ilu(A, τ=0.25)
@show nnz(iLU) / nnz(A);
nnz(iLU) / nnz(A) = 9.654509669435043
The result is almost 10 times as dense as and yet still not a true factorization of it. However, it’s close enough for an approximate inverse in a preconditioner. The actual preconditioning matrix is , but we just supply the factorization to gmres
.
_, history = GMRES(A, b, Pl=iLU, maxiter=100, reltol=1e-5, log=true)
history
Converged after 7 iterations.
The τ parameter in ilu
balances the accuracy of the iLU factorization with the time needed to compute it and invert it. As , more of the elements are kept, making the preconditioner more effective but slower per iteration.
plt = plot(0:40, resnorm[1:41] / resnorm[1];
label="no preconditioning", legend=:bottomright,
xaxis=("iteration number"),
yaxis=(:log10, "residual norm"),
title="Incomplete LU preconditioning")
for τ in [2, 1, 0.25, 0.1]
t = @elapsed iLU = ilu(A; τ)
t += @elapsed _, history = GMRES(A, b, Pl=iLU, maxiter=100,
reltol=1e-5, log=true)
resnorm = history[:resnorm]
label = "τ = $τ, time = $(round(t,digits=3))"
plot!(0:length(resnorm)-1, resnorm / resnorm[1]; label)
end
plt
In any given problem, it’s impossible to know in advance where the right balance lies between fidelity and speed for the preconditioner.