Skip to article frontmatterSkip to article content

Eigenvalue decomposition

To this point we have dealt frequently with the solution of the linear system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}. Alongside this problem in its importance to linear algebra is the eigenvalue problem.

7.2.1Complex matrices

A matrix with real entries can have complex eigenvalues. Therefore we assume all matrices, vectors, and scalars may be complex in what follows. Recall that a complex number can be represented as a+iba+i b for real aa and bb and where i2=1i^2=-1. The complex conjugate of x=a+ibx=a+i b is denoted xˉ\bar{x} and is given by xˉ=aib\bar{x}=a-i b. The magnitude or modulus of a complex number zz is

z=zzˉ.|z| = \sqrt{z\cdot \bar{z}}.

For the most part, “adjoint” replaces “transpose,” “hermitian” replaces “symmetric,” and “unitary matrix” replaces “orthogonal matrix” when applying our previous results to complex matrices.

7.2.2Eigenvalue decomposition

An easy rewrite of the eigenvalue definition (7.2.1) is that (AλI)x=0(\mathbf{A} - \lambda\mathbf{I}) \mathbf{x} = \boldsymbol{0}. Hence (AλI)(\mathbf{A} - \lambda\mathbf{I}) is singular, and it therefore must have a zero determinant. This is the property most often used to compute eigenvalues by hand.

The determinant det(AλI)\det(\mathbf{A} - \lambda \mathbf{I}) is called the characteristic polynomial. Its roots are the eigenvalues, so we know that an n×nn\times n matrix has nn eigenvalues, counting algebraic multiplicity.

Suppose that Avk=λkvk\mathbf{A}\mathbf{v}_k=\lambda_k\mathbf{v}_k for k=1,,nk=1,\ldots,n. We can summarize these as

[Av1Av2Avn]=[λ1v1λ2v2λnvn],A[v1v2vn]=[v1v2vn][λ1λ2λn],\begin{split} \begin{bmatrix} \mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \cdots & \mathbf{A}\mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \lambda_1 \mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \cdots & \lambda_n \mathbf{v}_n \end{bmatrix}, \\[1mm] \mathbf{A} \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix}, \end{split}

which we write as

AV=VD. \mathbf{A} \mathbf{V} = \mathbf{V} \mathbf{D}.

If we find that V\mathbf{V} is a nonsingular matrix, then we arrive at a key factorization.[1]

Observe that if Av=λv\mathbf{A}\mathbf{v} = \lambda \mathbf{v} for nonzero v\mathbf{v}, then the equation remains true for any nonzero multiple of v\mathbf{v}. Therefore, eigenvectors are not unique, and thus neither is an EVD.

We stress that while (7.2.6) is possible for all square matrices, (7.2.7) is not. One simple example of a nondiagonalizable matrix is

B=[1101]. \mathbf{B} = \begin{bmatrix} 1 & 1\\0 & 1 \end{bmatrix}.

There is a common circumstance in which we can guarantee an EVD exists. The proof of the following theorem can be found in many elementary texts on linear algebra.

7.2.3Similarity and matrix powers

The particular relationship between matrices A\mathbf{A} and D\mathbf{D} in (7.2.7) is important.

A similarity transformation does not change eigenvalues, a fact that is typically proved in elementary linear algebra texts.

The EVD is especially useful for matrix powers. To begin,

A2=(VDV1)(VDV1)=VD(V1V)DV1=VD2V1.\mathbf{A}^2=(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})=\mathbf{V}\mathbf{D}(\mathbf{V}^{-1}\mathbf{V})\mathbf{D}\mathbf{V}^{-1}=\mathbf{V}\mathbf{D}^2\mathbf{V}^{-1}.

Multiplying this result by A\mathbf{A} repeatedly, we find that

Ak=VDkV1.\mathbf{A}^k = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}.

Because D\mathbf{D} is diagonal, its power Dk\mathbf{D}^k is just the diagonal matrix of the kkth powers of the eigenvalues.

Furthermore, given a polynomial p(z)=c0+c1z++cmzmp(z)=c_0+c_1 z + \cdots + c_m z^m, we can apply the polynomial to the matrix in a straightforward way,

p(A)=c0I+c1A++cmAm.p(\mathbf{A}) = c_0\mathbf{I} +c_1 \mathbf{A} + \cdots + c_m \mathbf{A}^m.

Applying (7.2.10) leads to

p(A)=c0VV1+c1VDV1++cmVDmV1=V[c0I+c1D++cmDm]V1=V[p(λ1)p(λ2)p(λn)]V1.\begin{split} p(\mathbf{A}) & = c_0\mathbf{V}\mathbf{V}^{-1} +c_1 \mathbf{V}\mathbf{D}\mathbf{V}^{-1} + \cdots + c_m \mathbf{V}\mathbf{D}^m\mathbf{V}^{-1} \\ &= \mathbf{V} \cdot [ c_0\mathbf{I} +c_1 \mathbf{D} + \cdots + c_m \mathbf{D}^m] \cdot \mathbf{V}^{-1} \\[1mm] &= \mathbf{V} \cdot \begin{bmatrix} p(\lambda_1) & & & \\ & p(\lambda_2) & & \\ & & \ddots & \\ & & & p(\lambda_n) \end{bmatrix} \cdot \mathbf{V}^{-1}. \end{split}

Finally, given the convergence of Taylor polynomials to common functions, we are able to apply a function ff to a square matrix by replacing pp with ff in (7.2.11).

7.2.4Conditioning of eigenvalues

Just as linear systems have condition numbers that quantify the effect of finite precision, eigenvalue problems may be poorly conditioned too. While many possible results can be derived, we will use just one, the Bauer–Fike theorem.

The Bauer–Fike theorem tells us that eigenvalues can be perturbed by an amount that is κ(V)\kappa(\mathbf{V}) times larger than perturbations to the matrix. This result is a bit less straightforward than it might seem—eigenvectors are not unique, so there are multiple possible values for κ(V)\kappa(\mathbf{V}). Even so, the theorem indicates caution when a matrix has eigenvectors that form an ill-conditioned matrix. The limiting case of κ(V)=\kappa(\mathbf{V})=\infty might be interpreted as indicating a nondiagonalizable matrix A\mathbf{A}. The other extreme is also of interest: κ(V)=1\kappa(\mathbf{V})=1, which implies that V\mathbf{V} is unitary.

As we will see in Symmetry and definiteness, hermitian and real symmetric matrices are normal. Since the condition number of a unitary matrix is equal to 1, (7.2.13) guarantees that a perturbation of a normal matrix changes the eigenvalues by the same amount or less.

7.2.5Computing the EVD

Roots of the characteristic polynomial are not used in numerical methods for finding eigenvalues.[2] Practical algorithms for computing the EVD go beyond the scope of this book. The essence of the matter is the connection to matrix powers indicated in (7.2.10). (We will see much more about the importance of matrix powers in Chapter 8.)

If the eigenvalues have different complex magnitudes, then as kk\to\infty the entries on the diagonal of Dk\mathbf{D}^k become increasingly well separated and easy to pick out. It turns out that there is an astonishingly easy and elegant way to accomplish this separation without explicitly computing the matrix powers.

The process demonstrated in Demo 7.2.4 is known as the Francis QR iteration, and it can be formulated as an O(n3)O(n^3) algorithm for finding the EVD. It forms the basis of most practical eigenvalue computations, at least until the matrix size approaches 104 or so.

7.2.6Exercises

  1. (a) ✍ Suppose that matrix A\mathbf{A} has an eigenvalue λ. Show that for any induced matrix norm, Aλ\| \mathbf{A} \|\ge |\lambda|.

    (b) ✍ Find a matrix A\mathbf{A} such that A2\| \mathbf{A} \|_2 is strictly larger than λ|\lambda| for all eigenvalues λ. (Proof-by-computer isn’t allowed here. You don’t need to compute A2\| \mathbf{A} \|_2 exactly, just a lower bound for it.)

  1. ✍ Prove that the matrix B\mathbf{B} in (7.2.8) does not have two independent eigenvectors.

  2. ⌨ Find the eigenvalues of each matrix. Then, for each eigenvalue λ, use rank to verify that λI\lambda\mathbf{I} minus the given matrix is singular.

    A=[210121012]\mathbf{A} = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \end{bmatrix}\qquad B=[211221122]\mathbf{B} = \begin{bmatrix} 2 & -1 & -1 \\ -2 & 2 & -1 \\ -1 & -2 & 2 \end{bmatrix} \qquad C=[211121112] \mathbf{C} = \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix}

    D=[3100131001310013]\mathbf{D} = \begin{bmatrix} 3 & 1 & 0 & 0 \\ 1 & 3 & 1 & 0 \\ 0 & 1 & 3 & 1 \\ 0 & 0 & 1 & 3 \end{bmatrix}\qquad E=[4321242112411214]\mathbf{E} = \begin{bmatrix} 4 & -3 & -2 & -1\\ -2 & 4 & -2 & -1 \\ -1 & -2 & 4 & -1 \\ -1 & -2 & -1 & 4 \\ \end{bmatrix}

  1. (a) ✍ Show that the eigenvalues of a diagonal n×nn\times n matrix D\mathbf{D} are the diagonal entries of D\mathbf{D}. (That is, produce the associated eigenvectors.)

    (b) ✍ The eigenvalues of a triangular matrix are its diagonal entries. Prove this in the 3×33\times 3 case,

    T=[t11t12t130t22t2300t33], \mathbf{T} = \begin{bmatrix} t_{11} & t_{12}& t_{13}\\ 0 & t_{22} & t_{23} \\ 0 & 0 & t_{33} \end{bmatrix},

    by finding the eigenvectors. (Start by showing that [1,0,0]T[1,0,0]^T is an eigenvector. Then show how to make [a,1,0]T[a,1,0]^T an eigenvector, except for one case that does not change the outcome. Continue the same logic for [a,b,1]T[a,b,1]^T.)

  2. ✍ Let A=π6[4144]\mathbf{A}=\displaystyle\frac{\pi}{6}\begin{bmatrix} 4 & 1 \\ 4 & 4 \end{bmatrix}.

    (a) Show that

    λ1=π,v1=[12],λ2=π3,v2=[12]\lambda_1=\pi,\, \mathbf{v}_1=\begin{bmatrix}1 \\ 2 \end{bmatrix}, \quad \lambda_2=\frac{\pi}{3},\, \mathbf{v}_2=\begin{bmatrix}1 \\ -2 \end{bmatrix}

    yield an EVD of A\mathbf{A}.

    (b) Use (7.2.12) to evaluate p(A)p(\mathbf{A}), where p(x)=(xπ)4p(x) = (x-\pi)^4.

    (c) Use the function analog of (7.2.12) to evaluate cos(A)\cos(\mathbf{A}).

  1. ⌨ In Exercise 2.3.5, you showed that the displacements of point masses placed along a string satisfy a linear system Aq=f\mathbf{A}\mathbf{q}=\mathbf{f} for an (n1)×(n1)(n-1)\times(n-1) matrix A\mathbf{A}. The eigenvalues and eigenvectors of A\mathbf{A} correspond to resonant frequencies and modes of vibration of the string. For n=40n=40 and the physical parameters given in part (b) of that exercise, find the eigenvalue decomposition of A\mathbf{A}. Report the three eigenvalues with smallest absolute value, and plot all three associated eigenvectors on a single graph (as functions of the vector row index).

  2. Demo 7.2.4 suggests that the result of the Francis QR iteration as kk\to\infty sorts the eigenvalues on the diagonal according to a particular ordering. Following the code there as a model, create a random matrix with eigenvalues equal to 9.6,8.6,,10.4-9.6,-8.6,\ldots,10.4, perform the iteration 200 times, and check whether the sorting criterion holds in your experiment as well.

  3. ⌨ Eigenvalues of random matrices and their perturbations can be very interesting.

    (a) Let A=randn(60,60).[3] Scatter plot its eigenvalues in the complex plane, using a plot aspect ratio of 1 and red diamonds as markers.

    (b) Let E\mathbf{E} be another random 60×6060\times 60 matrix, and on top of the previous graph, plot the eigenvalues of A+0.05E\mathbf{A}+0.05\mathbf{E} as blue dots. Repeat this for 100 different values of E\mathbf{E}.

    (c) Let T=triu(A). On a new graph, scatter plot the eigenvalues of T\mathbf{T} in the complex plane. (They all lie on the real axis.)

    (d) Repeat part (b) with T\mathbf{T} in place of A\mathbf{A}.

    (e) Compute some condition numbers and apply Theorem 7.2.3 to explain the dramatic difference between your plots with respect to the dot distributions.

Footnotes
  1. The terms “factorization” and “decomposition” are equivalent; they coexist mainly for historical reasons.

  2. In fact, the situation is reversed: eigenvalue methods are among the best ways to compute the roots of a given polynomial.

  3. The randn function generates random numbers from a standard normal distribution. In Python, it is found in the numpy.random module.