Skip to article frontmatterSkip to article content

Matrix-free iterations

A primary reason for our interest in matrices is their relationship to linear transformations. If we define f(x)=Ax\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}, then for all vectors x\mathbf{x}, y\mathbf{y}, and scalars α,

f(x+y)=f(x)+f(y),f(αx)=αf(x).\begin{split} \mathbf{f}(\mathbf{x} + \mathbf{y} ) &= \mathbf{f}(\mathbf{x}) + \mathbf{f}(\mathbf{y} ), \\ \mathbf{f}(\alpha \mathbf{x} ) & = \alpha\, \mathbf{f}(\mathbf{x}). \end{split}

These properties define a linear transformation. Moreover, every linear transformation between finite-dimensional vector spaces can be represented as a matrix-vector multiplication.

8.7.1Matrix-free iterations

In Chapter 4 we solved the nonlinear rootfinding problem f(x)=0\mathbf{f}(\mathbf{x})=\boldsymbol{0} with methods that needed only the ability to evaluate f\mathbf{f} at any known value of x\mathbf{x}. By repeatedly evaluating f\mathbf{f} at cleverly chosen points, these algorithms were able to return an estimate for f1(0)\mathbf{f}^{-1}(\boldsymbol{0}).

A close examination reveals that the power method and Krylov subspace methods have the same structure because the only appearance of the matrix A\mathbf{A} in them is to multiply a known vector, i.e., to evaluate f(x)=Ax\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}. This is used to evaluate the inverse, A1b\mathbf{A}^{-1}\mathbf{b}.

Bringing these points of view together leads us to a cornerstone of modern scientific computation: matrix-free iterations. Krylov subspace methods can be used to invert a linear transformation if one provides code for the transformation, even if its associated matrix is not known explicitly.

8.7.2Blurring images

In From matrix to insight we saw that a grayscale image can be represented as an m×nm\times n matrix X\mathbf{X} of pixel intensity values. Now consider a simple model for blurring the image. Define B\mathbf{B} as the m×mm\times m tridiagonal matrix

Bij={12if i=j,14if ij=1,0otherwise.B_{ij} = \begin{cases} \tfrac{1}{2} & \text{if $i=j$},\\ \tfrac{1}{4} & \text{if $|i-j|=1$},\\ 0 & \text{otherwise.} \end{cases}

The product BX\mathbf{B}\mathbf{X} applies B\mathbf{B} to each column of X\mathbf{X}. Within that column it does a weighted average of the values of each pixel and its two neighbors. That has the effect of blurring the image vertically. We can increase the amount of blur by applying B\mathbf{B} repeatedly.

In order to blur horizontally, we can transpose the image and apply blurring in the same way. We need a blurring matrix defined as in (8.7.2) but with size n×nn\times n. We call this matrix C\mathbf{C}. Altogether the horizontal blurring is done by transposing, applying C\mathbf{C}, and transposing back to the original orientation. That is,

(CXT)T=XCT=XC,\bigl(\mathbf{C} \mathbf{X}^T\bigr)^T = \mathbf{X}\mathbf{C}^T = \mathbf{X}\mathbf{C},

using the symmetry of C\mathbf{C}. So we can describe blur in both directions as the function

blur(X)=BkXCk\operatorname{blur}(\mathbf{X}) = \mathbf{B}^k \mathbf{X} \mathbf{C}^k

for a positive integer kk.

8.7.3Deblurring

A more interesting operation is deblurring: given an image blurred by poor focus, can we reconstruct the true image? Conceptually, we want to invert the function blur(X)\operatorname{blur}(\mathbf{X}).

It’s easy to see from (8.7.4) that the blur operation is a linear transformation on image matrices. But an m×nm\times n image matrix is equivalent to a length-mnmn vector—it’s just a matter of interpreting the shape of the same data. Let vec(X)=x\operatorname{vec}(\mathbf{X})=\mathbf{x} and unvec(x)=X\operatorname{unvec}(\mathbf{x})=\mathbf{X} be the mathematical statements of such reshaping operations. Now say X\mathbf{X} is the original image and Z=blur(X)\mathbf{Z}=\operatorname{blur}(\mathbf{X}) is the blurred one. Then by linearity there is some matrix A\mathbf{A} such that

Avec(X)=vec(Z),\mathbf{A} \operatorname{vec}(\mathbf{X}) = \operatorname{vec}(\mathbf{Z}),

or Ax=z\mathbf{A}\mathbf{x}=\mathbf{z}.

The matrix A\mathbf{A} is mn×mnmn\times mn; for a 12-megapixel image, it would have 1.4×10141.4\times 10^{14} entries! Admittedly, it is extremely sparse, but the point is that we don’t need it at all.

Instead, given any vector u\mathbf{u} we can compute v=Au\mathbf{v}=\mathbf{A}\mathbf{u} through the steps

U=unvec(u),V=blur(U),v=vec(V).\begin{align*} \mathbf{U} &= \operatorname{unvec}(\mathbf{u}),\\ \mathbf{V} &= \operatorname{blur}(\mathbf{U}),\\ \mathbf{v} &= \operatorname{vec}(\mathbf{V}). \end{align*}

The following example shows how to put these ideas into practice with MINRES.

8.7.4Exercises

  1. ✍ Show using (8.7.1) and (8.7.4) that the blur operation is a linear transformation.

  2. ✍ In each case, state with reasons whether the given transformation on nn-vectors is linear.

    (a) f(x)=[x2x3xnx1]\,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_2\\x_3 \\\vdots\\ x_n \\ x_1 \end{bmatrix}\qquad (b) f(x)=[x1x1+x2x1+x2+x3x1++xn]\,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1\\x_1+x_2\\x_1+x_2+x_3\\\vdots\\x_1+\cdots+x_n \end{bmatrix} \qquad (c) f(x)=[x1+1x2+2x3+3xn+n]\,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1 + 1 \\x_2 + 2 \\ x_3 + 3 \\\vdots \\ x_n+n \end{bmatrix} \qquad (d) f(x)=xe1\,\mathbf{f}(\mathbf{x}) = \|\mathbf{x}\|_\infty\, \mathbf{e}_1

  3. ✍ Suppose that code for the linear transformation f(x)=Ax\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x} is given for an unknown matrix A\mathbf{A}. Explain carefully how one could construct A\mathbf{A}.

  4. ⌨ The matrix of the blur transformation happens to be symmetric and positive definite. Repeat Demo 8.7.2 using CG for the deblurring.

  5. The condition number of the matrix of the blur transformation is related to the condition numbers of the single-dimension matrices Bk\mathbf{B}^k and Ck\mathbf{C}^k in (8.7.4).

    (a) ⌨ Let m=50m=50. Show that B\mathbf{B} has a Cholesky factorization and thus is SPD. Find κ(B)\kappa(\mathbf{B}). (Note: cond requires a regular dense matrix, not a sparse matrix.)

    (b) ✍ Explain why part (a) implies κ(Bk)=κ(B)k\kappa( \mathbf{B}^k ) = \kappa(\mathbf{B})^k.

    (c) ✍ Explain two important effects of the limit kk\to \infty on deblurring by Krylov methods.

  6. The cumulative summation function cumsum is defined as

    f(x)=[x1x1+x2x1+x2++xn].\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1 \\ x_1+x_2 \\ \vdots \\ x_1 + x_2 + \cdots + x_n \end{bmatrix}.

    (a) ✍ Show that f\mathbf{f} is a linear transformation.

    (b) ⌨ Define vector b\mathbf{b} by bi=(i/100)2b_i = (i/100)^2 for i=1,,100i=1,\ldots,100. Then use gmres to find x=f1(b)\mathbf{x}=\mathbf{f}^{-1}(\mathbf{b}).

    (c) ⌨ Plot x\mathbf{x}, and explain why the result looks as it does.