Skip to article frontmatterSkip to article content

Power iteration

Given that matrix-vector multiplication is fast for sparse matrices, let’s see what we might accomplish with only that at our disposal.

There was a little cheating in Demo 8.2.1 to make the story come out neatly (specifically, the line A=A./sum(A,dims=1)). But it illustrates an important general fact that we investigate now.

8.2.1Dominant eigenvalue

Analysis of matrix powers is most straightforward in the diagonalizable case. Let A\mathbf{A} be any diagonalizable n×nn\times n matrix having eigenvalues λ1,,λn\lambda_1,\ldots,\lambda_n and corresponding linearly independent eigenvectors v1,,vn\mathbf{v}_1,\ldots,\mathbf{v}_n. Furthermore, suppose the eigenvalues are such that

λ1>λ2λ3λn.|\lambda_1| > |\lambda_2| \ge |\lambda_3| \ge \cdots \ge |\lambda_n|.

Given (8.2.1) we say that λ1\lambda_1 is the dominant eigenvalue. This was the case with λ1=1\lambda_1=1 for A\mathbf{A} in Demo 8.2.1.

Now let x\mathbf{x} be an nn-vector, let kk be a positive integer, and refer to (7.2.10):

Akx=VDkV1x.\mathbf{A}^k \mathbf{x} = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}\mathbf{x}.

Let z=V1x\mathbf{z}=\mathbf{V}^{-1}\mathbf{x}, and recall that D\mathbf{D} is a diagonal matrix of eigenvalues. Then

Akx=VDkz=V[λ1kz1λ2kz2λnkzn]=λ1k[z1v1+z2(λ2λ1)kv2++zn(λnλ1)kvn].\begin{split} \mathbf{A}^k\mathbf{x} &= \mathbf{V}\mathbf{D}^k \mathbf{z} = \mathbf{V}\begin{bmatrix} \lambda_1^kz_1 \\[0.5ex] \lambda_2^kz_2 \\ \vdots \\ \lambda_n^kz_n \end{bmatrix} \\ &= \lambda_1^k \left[ z_1 \mathbf{v}_{1} + z_2 \left(\frac{\lambda_2}{\lambda_1}\right) ^k \mathbf{v}_{2} + \cdots + z_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \mathbf{v}_{n} \right]. \end{split}

Since λ1\lambda_1 is dominant, we conclude that if z10z_1\neq 0,

Akxλ1kz1v1z2λ2λ1kv2++znλnλ1kvn0 as k.\left\| \frac{ \mathbf{A}^k\mathbf{x}}{\lambda_1^k} - z_1\mathbf{v}_1\right\| \le |z_2|\cdot\left|\frac{\lambda_2}{\lambda_1}\right| ^k \| \mathbf{v}_{2} \| + \cdots + |z_n|\cdot\left|\frac{\lambda_n}{\lambda_1}\right|^k \| \mathbf{v}_{n} \| \rightarrow 0 \text{ as $k\rightarrow \infty$}.

That is, Akx\mathbf{A}^k\mathbf{x} eventually is a scalar multiple of the dominant eigenvector.[1]

8.2.2Power iteration

An important technicality separates us from an algorithm: unless λ1=1|\lambda_1|=1, the factor λ1k\lambda_1^k tends to make Akx\|\mathbf{A}^k\mathbf{x}\| either very large or very small. Nor can we easily normalize by λ1k\lambda_1^k, as in (8.2.4), unless we know λ1\lambda_1 in advance.

To make a practical algorithm, we alternate matrix-vector multiplication with a renormalization of the vector. In the following, we use xk,mx_{k,m} and yk,my_{k,m} to mean the mmth component of vectors xk\mathbf{x}_k and yk\mathbf{y}_k.

Note that by definition, xk+1=1\| \mathbf{x}_{k+1}\|_\infty=1. Also, we can write

xk=(α1α2αk)Akx1.\mathbf{x}_{k} = (\alpha_1 \alpha_2 \cdots \alpha_k ) \mathbf{A}^k \mathbf{x}_{1}.

Thus Algorithm 8.2.1 modifies (8.2.3) and (8.2.4) only slightly.

Finally, if xk\mathbf{x}_k is nearly a dominant eigenvector of A\mathbf{A}, then Axk\mathbf{A}\mathbf{x}_k is nearly λ1xk\lambda_1\mathbf{x}_k, and we can take the ratio βk=yk,m/xk,m\beta_k=y_{k,m}/x_{k,m} as an eigenvalue estimate. In fact, revisiting (8.2.3), the extra αj\alpha_j normalization factors cancel in the ratio, and, after some simplification, we get

βk=yk,mxk,m=λ11+r2k+1b2++rnk+1bn1+r2kb2++rnkbn,\beta_k = \frac{y_{k,m}}{x_{k,m}} = \lambda_1 \frac{1+r_2^{k+1} b_2 + \cdots + r_n^{k+1} b_n}{1+r_2^{k} b_2 + \cdots + r_n^{k} b_n},

where rj=λj/λ1r_j=\lambda_j/\lambda_1 and the bjb_j are constants. By assumption (8.2.1), each rjr_j satisfies rj<1|r_j|<1, so we see that βkλ1\beta_k\rightarrow \lambda_1 as kk\rightarrow\infty.

Function 8.2.2 is our implementation of power iteration.

Observe that the only use of A\mathbf{A} is to find the matrix-vector product Ax\mathbf{A}\mathbf{x}, which makes exploitation of the sparsity of A\mathbf{A} trivial.

8.2.3Convergence

Let’s examine the terms in the numerator and denominator of (8.2.6) more carefully:

r2kb2++rnkbn=r2k[b2+(r3r2)kb3++(rnr2)kbn]=r2k[b2+(λ3λ2)kb3++(λnλ2)kbn].\begin{split} r_2^{k} b_2 + \cdots + r_n^{k} b_n &= r_2^k \left[ b_2 + \left( \frac{r_3}{r_2} \right)^kb_3 + \cdots + \left( \frac{r_n}{r_2} \right)^kb_n \right] \\ &= r_2^k \left[ b_2 + \left( \frac{\lambda_3}{\lambda_2} \right)^kb_3 + \cdots + \left( \frac{\lambda_n}{\lambda_2} \right)^kb_n \right]. \end{split}

At this point we’ll introduce an additional assumption,

λ2>λ3λn.|\lambda_2| > |\lambda_3| \ge \cdots \ge |\lambda_n|.

This condition isn’t strictly necessary, but it simplifies the following statements considerably because now it’s clear that the quantity in (8.2.7) approaches b2r2kb_2 r_2^k as kk\rightarrow \infty.

Next we estimate (8.2.6) for large kk, using a geometric series expansion for the denominator to get

βkλ1(1+b2r2k+1)(1b2r2k+O(r22k)),βkλ1λ1b2(r21)r2k.\begin{split} \beta_k & \to \lambda_1 \left( 1+b_2 r_2^{k+1} \right) \left( 1 - b_2 r_2^{k} + O(r_2^{2k}) \right), \\ \beta_k - \lambda_1 &\to \lambda_1 b_2 ( r_2 - 1 ) r_2^{k}. \end{split}

This is linear convergence with factor r2r_2:

βk+1λ1βkλ1r2=λ2λ1as k.\frac{\beta_{k+1} - \lambda_1}{\beta_{k}-\lambda_1} \rightarrow r_2 = \frac{\lambda_2}{\lambda_1} \quad \text{as } k\rightarrow \infty.

The practical utility of (8.2.10) is limited because if we knew λ1\lambda_1 and λ2\lambda_2, we wouldn’t be running the power iteration in the first place! Sometimes it’s possible to find estimates of or bounds on the ratio. If nothing else, though, it is useful to know that linear convergence is expected at a rate based solely on the dominant eigenvalues.

8.2.4Exercises

  1. ⌨ Use Function 8.2.2 to perform 20 power iterations for the following matrices. Quantitatively compare the observed convergence to the prediction in (8.2.10).

    (a) A=[1.110.12.4]\mathbf{A} = \begin{bmatrix} 1.1 & 1 \\ 0.1 & 2.4 \end{bmatrix} \quad (b) A=[2110]\mathbf{A} = \begin{bmatrix} 2 & 1 \\ 1 & 0 \end{bmatrix} \quad (c) A=[654543432] \mathbf{A} = \begin{bmatrix} 6 & 5 & 4 \\ 5 & 4 & 3 \\ 4 & 3 & 2 \end{bmatrix}

    (d) A=[814014811142028717]\mathbf{A} = \begin{bmatrix} 8 & -14 & 0 & -14 \\ -8 & 1 & 1 & 1 \\ -4 & -2 & 0 & 2 \\ 8 & -7 & -1 & -7 \end{bmatrix}

  2. ✍ Describe what happens during power iteration using the matrix A=[0110]\mathbf{A}= \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} and initial vector x=[0.40.7]\mathbf{x}=\begin{bmatrix} 0.4\\0.7 \end{bmatrix}. Does the algorithm converge to an eigenvector? How does this relate to (8.2.3)?

  3. ⌨ In Exercise 2.3.5 we considered a mass-lumped model of a hanging string that led to a tridiagonal system of linear equations. Then, in Exercise 7.2.6, we found that eigenvectors of the same matrix correspond to vibrational modes of the string. The same setup can be applied to a membrane hanging from a square frame. Lumping the mass onto a Cartesian grid, each interacts with the four neighbors to the north, south, east, and west. If nn masses are used in each coordinate direction, we get an n2×n2n^2\times n^2 sparse matrix A\mathbf{A} that can be constructed by FNC.poisson(n).

    (a) Let n=10n=10 and make a spy plot of A\mathbf{A}. What is the density of A\mathbf{A}? Most rows all have the same number of nonzeros; find this number.

    (b) Find the dominant λ1\lambda_1 using eigs for n=10,15,20,25n=10,15,20,25.

    (c) For each nn in part (b), apply 100 steps of Function 8.2.2. On one graph, plot the four convergence curves βkλ1|\beta_k-\lambda_1| using a semi-log scale. (They will not be smooth curves because this matrix has many repeated eigenvalues that complicate the convergence analysis.)

  4. ⌨ Copy the instructions from Exercise 8.1.5 to obtain a large, sparse matrix A\mathbf{A}. Use Function 8.2.2 to find the leading eigenvalue of ATA\mathbf{A}^T\mathbf{A} to at least six significant digits.

  5. ⌨ For symmetric matrices, the Rayleigh quotient (7.4.5) converts an O(ϵ)O(\epsilon) eigenvector estimate into an O(ϵ2)O(\epsilon^2) eigenvalue estimate. Duplicate Function 8.2.2 and rename it to powersym. Modify the new function to use the Rayleigh quotient to produce the entries of β. Your function should not introduce any additional matrix-vector multiplications. Apply the original Function 8.2.2 and the new powersym on the matrix matrixdepot("fiedler",100), plotting the convergence curves on one graph.

Footnotes
  1. If x\mathbf{x} is chosen randomly, the probability that z1=0z_1=0 is mathematically zero.