To this point we have dealt frequently with the solution of the linear system Ax=b. Alongside this problem in its importance to linear algebra is the eigenvalue problem.
A matrix with real entries can have complex eigenvalues. Therefore we assume all matrices, vectors, and scalars may be complex in what follows. Recall that a complex number can be represented as a+ib for real a and b and where i2=−1. The complex conjugate of x=a+ib is denoted xˉ and is given by xˉ=a−ib. The magnitude or modulus of a complex number z is
For the most part, “adjoint” replaces “transpose,” “hermitian” replaces “symmetric,” and “unitary matrix” replaces “orthogonal matrix” when applying our previous results to complex matrices.
An easy rewrite of the eigenvalue definition (7.2.1) is that (A−λI)x=0. Hence (A−λI) is singular, and it therefore must have a zero determinant. This is the property most often used to compute eigenvalues by hand.
The determinant det(A−λI) is called the characteristic polynomial. Its roots are the eigenvalues, so we know that an n×n matrix has n eigenvalues, counting algebraic multiplicity.
Suppose that Avk=λkvk for k=1,…,n. We can summarize these as
If we find that V is a nonsingular matrix, then we arrive at a key factorization.[1]
Observe that if Av=λv for nonzero v, then the equation remains true for any nonzero multiple of v. Therefore, eigenvectors are not unique, and thus neither is an EVD.
We stress that while (7.2.6) is possible for all square matrices, (7.2.7) is not. One simple example of a nondiagonalizable matrix is
There is a common circumstance in which we can guarantee an EVD exists. The proof of the following theorem can be found in many elementary texts on linear algebra.
Finally, given the convergence of Taylor polynomials to common functions, we are able to apply a function f to a square matrix by replacing p with f in (7.2.11).
Just as linear systems have condition numbers that quantify the effect of finite precision, eigenvalue problems may be poorly conditioned too. While many possible results can be derived, we will use just one, the Bauer–Fike theorem.
The Bauer–Fike theorem tells us that eigenvalues can be perturbed by an amount that is κ(V) times larger than perturbations to the matrix. This result is a bit less straightforward than it might seem—eigenvectors are not unique, so there are multiple possible values for κ(V). Even so, the theorem indicates caution when a matrix has eigenvectors that form an ill-conditioned matrix. The limiting case of κ(V)=∞ might be interpreted as indicating a nondiagonalizable matrix A. The other extreme is also of interest: κ(V)=1, which implies that V is unitary.
As we will see in Symmetry and definiteness, hermitian and real symmetric matrices are normal. Since the condition number of a unitary matrix is equal to 1, (7.2.13) guarantees that a perturbation of a normal matrix changes the eigenvalues by the same amount or less.
Roots of the characteristic polynomial are not used in numerical methods for finding eigenvalues.[2] Practical algorithms for computing the EVD go beyond the scope of this book. The essence of the matter is the connection to matrix powers indicated in (7.2.10). (We will see much more about the importance of matrix powers in Chapter 8.)
If the eigenvalues have different complex magnitudes, then as k→∞ the entries on the diagonal of Dk become increasingly well separated and easy to pick out. It turns out that there is an astonishingly easy and elegant way to accomplish this separation without explicitly computing the matrix powers.
The process demonstrated in Demo 7.2.4 is known as the Francis QR iteration, and it can be formulated as an O(n3) algorithm for finding the EVD. Such an algorithm is the foundation of what the eigen function uses.
(a) ✍ Suppose that matrix A has an eigenvalue λ. Show that for any induced matrix norm, ∥A∥≥∣λ∣.
(b) ✍ Find a matrix A such that ∥A∥2 is strictly larger than ∣λ∣ for all eigenvalues λ. (Proof-by-computer isn’t allowed here. You don’t need to compute ∥A∥2 exactly, just a lower bound for it.)
✍ Prove that the matrix B in (7.2.8) does not have two independent eigenvectors.
⌨ Use eigvals to find the eigenvalues of each matrix. Then for each eigenvalue λ, use rank to verify that λI minus the given matrix is singular.
by finding the eigenvectors. (Start by showing that [1,0,0]T is an eigenvector. Then show how to make [a,1,0]T an eigenvector, except for one case that does not change the outcome. Continue the same logic for [a,b,1]T.)
(b) Use (7.2.12) to evaluate p(A), where p(x)=(x−π)4.
(c) Use the function analog of (7.2.12) to evaluate cos(A).
⌨ In Exercise 2.3.5, you showed that the
displacements of point masses placed along a string satisfy a linear system Aq=f for an (n−1)×(n−1) matrix A. The eigenvalues and eigenvectors of A correspond to resonant frequencies and modes of vibration of the string. For n=40 and the physical parameters given in part (b) of that exercise, find the eigenvalue decomposition of A. Report the three eigenvalues with smallest absolute value, and plot all three associated eigenvectors on a single graph (as functions of the vector row index).
⌨ Demo 7.2.4 suggests that the result of the Francis QR iteration as k→∞ sorts the eigenvalues on the diagonal according to a particular ordering. Following the code there as a model, create a random matrix with eigenvalues equal to −9.6,−8.6,…,10.4, perform the iteration 200 times, and check whether the sorting criterion holds in your experiment as well.
⌨ Eigenvalues of random matrices and their perturbations can be very interesting.
(a) Let A=randn(60,60). Scatter plot its eigenvalues in the complex plane, using aspect_ratio=1 and red diamonds as markers.
(b) Let E be another random 60×60 matrix, and on top of the previous graph, plot the eigenvalues of A+0.05E as blue dots. Repeat this for 100 different values of E.
(c) Let T=triu(A). On a new graph, scatter plot the eigenvalues of T in the complex plane. (They all lie on the real axis.)
(d) Repeat part (b) with T in place of A.
(e) Compute some condition numbers and apply Theorem 7.2.3 to explain the dramatic difference between your plots with respect to the dot distributions.