We have seen before that certain matrix properties enhance solutions to linear algebra problems. One of the most important of these is when A∗=A; i.e., A is hermitian. The Arnoldi iteration has a particularly useful specialization to this case. While in this section we describe the resulting algorithms, we do not present them in detail or show implementations.
where H~m is rows 1 through m of Hm. If A is hermitian, then so is the left side of this equation, hence H~m is hermitian too. But it is also upper Hessenberg, meaning that the (i,j) element is zero if i>j+1. By symmetry, this means that elements are zero when j>i+1 as well.
Equation (8.4.6) of the Arnoldi iteration now simplifies to a much shorter expression:
As before in deriving the Arnoldi iteration, when given the first m vectors we can solve for the entries in column m of H and then for qm+1. The resulting process is known as the Lanczos iteration. Its most important practical advantage is that while Arnoldi needs O(m) steps to get qm+1 from the previous vectors, Lanczos needs only O(1) steps, so restarting isn’t required for symmetric matrices.[1]
When A is hermitian and the Arnoldi iteration is reduced to Lanczos, the analog of GMRES is known as MINRES. Like GMRES, MINRES minimizes the residual ∥b−Ax∥ over increasingly larger Krylov spaces.
MINRES is also more theoretically tractable than GMRES. The following result relies on some advanced approximation theory. Recall that the eigenvalues of a hermitian matrix are real.
The bound for a definite matrix is better, as the next theorem shows. The upper bound (8.6.4) on the residual obeys a linear convergence rate. As the product κ+κ− grows, the rate of this convergence approaches 1. Hence the presence of eigenvalues close to the origin (relative to the max eigenvalues) is expected to force a slower convergence.
Given positive definiteness in addition to symmetry, we arrive at perhaps the most famous Krylov subspace method for Ax=b, called conjugate gradients.
Suppose now that A is hermitian and positive definite (HPD). Then A has a Cholesky factorization, which in the complex case is A=R∗R. Therefore, for any vector u,
The convergence of CG and MINRES is dependent on the eigenvalues of A. In the HPD case the eigenvalues are real and positive, and they equal the singular values. Hence the condition number κ is equal to the ratio of the largest eigenvalue to the smallest one. The following theorem suggests that MINRES and CG are not so different in convergence.
Theorem 8.6.2 characterizes the convergence of MINRES and CG similarly, differing only in whether the measurement is of the residual or the A-norm of the error, respectively. While these are different quantities, in practice one may not find a consistent advantage for one method over the other.
As in the indefinite case with MINRES, a larger condition number is associated with slower convergence in the positive definite case. Specifically, to make the bound in (8.6.10) less than a number ε requires
✍ For each part, the eigenvalues of A are given. Suppose MINRES is applied to solve Ax=b. Use (8.6.4) or (8.6.10), whichever is most appropriate, to determine a lower bound on m to guarantee reduction of the residual norm by a factor 10-4.
(a)−100,−99,…,−1,1,2,…,100
(b)−100,1,2,…,100
(c)1,2,…,100
⌨ Let b be a random unit vector of length 202. Define a diagonal matrix with diagonal entries dk given by
(a) Apply 120 iterations of MINRES to solve Ax=b. Compute the relative error of the answer, and plot the norm of the residual as a function of m on a log-linear scale.
(b) Add to your graph the line representing the upper bound (8.6.4). (Ignore the rounding in the exponent.) This line should stay strictly on or above the error curve.
⌨ Let b be a random unit vector of length 501. Define a sparse diagonal matrix A with diagonal entries dk given by
(a) Apply 100 iterations of MINRES to solve Ax=b. Compute the relative norm of the answer. Plot the norm of the residual as a function of m.
(b) Add to your graph the line representing the upper bound (8.6.10). This line should stay strictly on or above the convergence curve.
(c) Add a convergence curve for 100 iterations of cg.
✍ Suppose a family of SPD matrices A is parameterized by t, and that the condition numbers of the matrices scale like O(t2) as t→∞. Given that CG takes 60 iterations to reach a certain reduction in the error of a linear system when t=200, estimate the number of iterations CG will need to reach the same accuracy at t=300.
✍ Given real n×n symmetric A and vector b=Ax, we can define the scalar-valued function
(a) Expand and simplify the expression φ(x+v)−φ(x), keeping in mind that Ax=b.
(b) Using the result of part (a), prove that if A is an SPD matrix, φ has a global minimum at x.
(c) Show that for any vector u, ∥u−x∥A2−φ(u) is constant.
(d) Using the result of part (c), prove that CG minimizes φ(u) over Krylov subspaces.
⌨ Let n=50. Define the matrix Ak as FNC.poisson(n) minus k2 times the n2×n2 identity matrix, and define vector b as n2 copies of -1. The linear system Akx=b arises from the Helmholtz equation for wave propagation at a single frequency k.
(a) Apply both MINRES and CG to the Helmholtz system for k=1.3, solving to a relative residual tolerance of 10-5. Plotting their convergence curves together.
(b) Repeat part (a) for k=8.
(c) Use eigs on the matrix from part (b) to show that it is indefinite. (Hint: Use additional arguments to get the eigenvalues with smallest and largest real parts.) This helps explain why the CG convergence curve for this matrix looks rather strange.
In principle, the implementation of Lanczos iteration is minor change from Arnoldi, but numerical stability requires some extra analysis and effort. We do not present the details.