Skip to article frontmatterSkip to article content

Conditioning of linear systems

We are ready to consider the conditioning of solving the square linear system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}. In this problem, the data are A\mathbf{A} and b\mathbf{b}, and the solution is x\mathbf{x}. Both data and result are multidimensional, so we will use norms to measure their magnitudes.

The motivation for the definition of relative condition number in Chapter 1 was to quantify the response of the result to perturbations of the data. For simplicity, we start by allowing perturbations to b\mathbf{b} only while A\mathbf{A} remains fixed.

Let Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} be perturbed to

A(x+h)=b+d. \mathbf{A}(\mathbf{x}+\mathbf{h}) = \mathbf{b}+\mathbf{d}.

The condition number should be the relative change in the solution divided by relative change in the data,

hxdb=h  bd  x. \frac{\quad\dfrac{\| \mathbf{h} \| }{\| \mathbf{x} \| }\quad}{\dfrac{\| \mathbf{d} \| }{\| \mathbf{b} \|}} = \frac{\| \mathbf{h} \|\;\| \mathbf{b} \| }{\| \mathbf{d} \|\; \| \mathbf{x} \| }.

We can bound h\| \mathbf{h} \| in terms of d\| \mathbf{d} \|:

Ax+Ah=b+d,Ah=d,h=A1d,hA1d,\begin{split} \mathbf{A}\mathbf{x} + \mathbf{A} \mathbf{h} &= \mathbf{b} + \mathbf{d}, \\ \mathbf{A} \mathbf{h} &= \mathbf{d},\\ \mathbf{h} &= \mathbf{A}^{-1} \mathbf{d},\\ \| \mathbf{h} \| &\le \| \mathbf{A}^{-1}\| \,\| \mathbf{d} \|, \end{split}

where we have applied Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} and (2.7.8). Since also b=Ax\mathbf{b}=\mathbf{A}\mathbf{x} implies bAx\| \mathbf{b} \|\le \| \mathbf{A} \|\, \| \mathbf{x} \|, we derive

h  bd  x(A1d)(Ax)dx=A1A. \frac{\| \mathbf{h} \|\; \| \mathbf{b} \|}{\| \mathbf{d} \|\; \| \mathbf{x} \|} \le \frac{\bigl(\| \mathbf{A}^{-1} \|\, \| \mathbf{d} \|\bigr) \bigl(\| \mathbf{A} \|\,\| \mathbf{x} \|\bigr)}{\| \mathbf{d} \|\,\| \mathbf{x} \|} = \| \mathbf{A}^{-1}\| \, \| \mathbf{A} \|.

It is possible to show that this bound is tight, in the sense that the inequalities are in fact equalities for some choices of b\mathbf{b} and d\mathbf{d}. This result motivates a new definition.

2.8.1Main result

The matrix condition number (2.8.5) is equal to the condition number of solving a linear system of equations. Although we derived this fact above only for perturbations of b\mathbf{b}, a similar statement holds when A\mathbf{A} is perturbed.

Using a traditional Δ notation for the perturbation in a quantity, we can write the following.

Note that for any induced matrix norm,

1=I=AA1AA1=κ(A). 1 = \| \mathbf{I} \| = \| \mathbf{A} \mathbf{A}^{-1} \| \le \| \mathbf{A} \|\, \| \mathbf{A}^{-1} \| = \kappa(\mathbf{A}).

A condition number of 1 is the best we can hope for—in that case, the relative perturbation of the solution has the same size as that of the data. A condition number of size 10t10^t indicates that in floating-point arithmetic, roughly tt digits are lost (i.e., become incorrect) in computing the solution x\mathbf{x}. And if κ(A)>ϵmach1\kappa(\mathbf{A}) > \epsilon_\text{mach}^{-1}, then for computational purposes the matrix is effectively singular.

2.8.2Residual and backward error

Suppose that Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} and x~\tilde{\mathbf{x}} is a computed estimate of the solution x\mathbf{x}. The most natural quantity to study is the error, xx~\mathbf{x}-\tilde{\mathbf{x}}. Normally we can’t compute it because we don’t know the exact solution. However, we can compute something related.

Obviously, a zero residual means that x~=x\tilde{\mathbf{x}}=\mathbf{x}, and we have the exact solution. What happens more generally? Note that Ax~=br\mathbf{A}\tilde{\mathbf{x}}=\mathbf{b}-\mathbf{r}. That is, x~\tilde{\mathbf{x}} solves the linear system problem for a right-hand side that is changed by r-\mathbf{r}. This is precisely what is meant by backward error.

Hence residual and backward error are the same thing for a linear system. What is the connection to the (forward) error? We can reconnect with (2.8.6) by the definition h=x~x\mathbf{h} = \tilde{\mathbf{x}}-\mathbf{x}, in which case

d=A(x+h)b=Ah=r.\mathbf{d} = \mathbf{A}(\mathbf{x}+\mathbf{h})-\mathbf{b}=\mathbf{A}\mathbf{h} = -\mathbf{r}.

Thus (2.8.6) is equivalent to

xx~xκ(A)rb. \frac{\| \mathbf{x}-\tilde{\mathbf{x}} \|}{\| \mathbf{x} \|} \le \kappa(\mathbf{A}) \frac{\| \mathbf{r} \|}{\| \mathbf{b} \|}.

Equation (2.8.11) says that the gap between relative error and the relative residual is a multiplication by the matrix condition number.

2.8.3Exercises

  1. ⌨ Refer to Demo 2.8.1 for the definition of a Hilbert matrix. Make a table of the values of κ(Hn)\kappa(\mathbf{H}_n) in the 2-norm for n=2,3,,16n=2,3,\ldots,16. Speculate as to why the growth of κ appears to slow down at n=13n=13.

  2. ⌨ The purpose of this problem is to verify, like in Demo 2.8.1, the error bound

    xx~xκ(A)hb.\frac{\| \mathbf{x}-\tilde{\mathbf{x} \|}}{\| \mathbf{x} \|} \le \kappa(\mathbf{A}) \frac{\| \mathbf{h} \|}{\| \mathbf{b} \|}.

    Here x~\tilde{\mathbf{x}} is a numerical approximation to the exact solution x\mathbf{x}, and h\mathbf{h} is an unknown perturbation caused by machine roundoff. We will assume that d/b\| \mathbf{d} \|/\| \mathbf{b} \| is roughly eps().

    For each n=10,20,,70n=10,20,\ldots,70 let A = matrixdepot("prolate",n,0.4) and let x\mathbf{x} have components xk=k/nx_k=k/n for k=1,,nk=1,\ldots,n. Define b=A*x and let x~\tilde{\mathbf{x}} be the solution produced numerically by backslash.

    Make a table including columns for nn, the condition number of A\mathbf{A}, the observed relative error in x~\tilde{\mathbf{x}}, and the right-hand side of the inequality above. You should find that the inequality holds in every case.

  3. Exercise 2.3.7 suggests that the solutions of linear systems

    A=[110αββ01100001100001100001],b=[α0001]\mathbf{A} = \begin{bmatrix} 1 & -1 & 0 & \alpha-\beta & \beta \\ 0 & 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} \alpha \\ 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}

    become less accurate as β increases. Using α=0.1\alpha=0.1 and β=10,100,,1012\beta=10,100,\ldots,10^{12}, make a table with columns for β, x11|x_1-1|, and the condition number of the matrix.

  4. ⌨ Let An\mathbf{A}_n denote the (n+1)×(n+1)(n+1)\times(n+1) version of the Vandermonde matrix in Equation (2.1.3) based on the equally spaced interpolation nodes ti=i/nt_i=i/n for i=0,,ni=0,\ldots,n. Using the 1-norm, graph κ(An)\kappa(\mathbf{A}_n) as a function of nn for n=4,5,6,,20n=4,5,6,\ldots,20, using a log scale on the yy-axis. (The graph is nearly a straight line.)

  5. ⌨ The matrix A\mathbf{A} in (2.6.2) has unpivoted LU factors given in (2.6.3) as a function of parameter ε. For ϵ=102,104,,1010\epsilon = 10^{-2},10^{-4},\ldots,10^{-10}, make a table with columns for ε, κ(A)\kappa(\mathbf{A}), κ(L)\kappa(\mathbf{L}), and κ(U)\kappa(\mathbf{U}). (This shows that solution via unpivoted LU factorization is arbitrarily unstable.)

  6. ✍ Define An\mathbf{A}_n as the n×nn\times n matrix [1212121].\displaystyle\begin{bmatrix} 1 & -2 & & &\\ & 1 & -2 & & \\ & & \ddots & \ddots & \\ & & & 1 & -2 \\ & & & & 1 \end{bmatrix}.

    (a) Write out A21\mathbf{A}_2^{-1} and A31\mathbf{A}_3^{-1}.

    (b) Write out An1\mathbf{A}_n^{-1} in the general case n>1n>1. (If necessary, look at a few more cases in Julia until you are certain of the pattern.) Make a clear argument why it is correct.

    (c) Using the -norm, find κ(An)\kappa(\mathbf{A}_n).

  7. (a) Prove that for n×nn\times n nonsingular matrices A\mathbf{A} and B\mathbf{B}, κ(AB)κ(A)κ(B)\kappa(\mathbf{A}\mathbf{B})\le \kappa(\mathbf{A})\kappa(\mathbf{B}).

    (b) Show by means of an example that the result of part (a) cannot be an equality in general.

  8. ✍ Let D\mathbf{D} be a diagonal n×nn\times n matrix, not necessarily invertible. Prove that in the 1-norm,

    κ(D)=maxiDiiminiDii.\kappa(\mathbf{D}) = \frac{\max_i |D_{ii}|}{\min_i |D_{ii}|}.

    (Hint: See Exercise 2.7.10.)