Conditioning of linear systems - Fundamentals of Numerical Computation

We are ready to consider the conditioning of solving the square linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ . In this problem, the data are $\mathbf{A}$ and $\mathbf{b}$ , and the solution is $\mathbf{x}$ . Both data and result are multidimensional, so we will use norms to measure their magnitudes.

The motivation for the definition of relative condition number in Chapter 1 was to quantify the response of the result to perturbations of the data. For simplicity, we start by allowing perturbations to $\mathbf{b}$ only while $\mathbf{A}$ remains fixed.

Let $\mathbf{A}\mathbf{x}=\mathbf{b}$ be perturbed to

\mathbf{A}(\mathbf{x}+\mathbf{h}) = \mathbf{b}+\mathbf{d}.

(2.8.1)

The condition number should be the relative change in the solution divided by relative change in the data,

\frac{\quad\dfrac{\| \mathbf{h} \| }{\| \mathbf{x} \| }\quad}{\dfrac{\| \mathbf{d} \| }{\| \mathbf{b} \|}} = \frac{\| \mathbf{h} \|\;\| \mathbf{b} \| }{\| \mathbf{d} \|\; \| \mathbf{x} \| }.

(2.8.2)

We can bound $\| \mathbf{h} \|$ in terms of $\| \mathbf{d} \|$ :

\begin{split} \mathbf{A}\mathbf{x} + \mathbf{A} \mathbf{h} &= \mathbf{b} + \mathbf{d}, \\ \mathbf{A} \mathbf{h} &= \mathbf{d},\\ \mathbf{h} &= \mathbf{A}^{-1} \mathbf{d},\\ \| \mathbf{h} \| &\le \| \mathbf{A}^{-1}\| \,\| \mathbf{d} \|, \end{split}

(2.8.3)

where we have applied $\mathbf{A}\mathbf{x}=\mathbf{b}$ and (2.7.8). Since also $\mathbf{b}=\mathbf{A}\mathbf{x}$ implies $\| \mathbf{b} \|\le \| \mathbf{A} \|\, \| \mathbf{x} \|$ , we derive

\frac{\| \mathbf{h} \|\; \| \mathbf{b} \|}{\| \mathbf{d} \|\; \| \mathbf{x} \|} \le \frac{\bigl(\| \mathbf{A}^{-1} \|\, \| \mathbf{d} \|\bigr) \bigl(\| \mathbf{A} \|\,\| \mathbf{x} \|\bigr)}{\| \mathbf{d} \|\,\| \mathbf{x} \|} = \| \mathbf{A}^{-1}\| \, \| \mathbf{A} \|.

(2.8.4)

It is possible to show that this bound is tight, in the sense that the inequalities are in fact equalities for some choices of $\mathbf{b}$ and $\mathbf{d}$ . This result motivates a new definition.

2.8.1Main result¶

The matrix condition number (2.8.5) is equal to the condition number of solving a linear system of equations. Although we derived this fact above only for perturbations of $\mathbf{b}$ , a similar statement holds when $\mathbf{A}$ is perturbed.

Using a traditional Δ notation for the perturbation in a quantity, we can write the following.

Note that for any induced matrix norm,

1 = \| \mathbf{I} \| = \| \mathbf{A} \mathbf{A}^{-1} \| \le \| \mathbf{A} \|\, \| \mathbf{A}^{-1} \| = \kappa(\mathbf{A}).

(2.8.8)

A condition number of 1 is the best we can hope for—in that case, the relative perturbation of the solution has the same size as that of the data. A condition number of size $10^t$ indicates that in floating-point arithmetic, roughly $t$ digits are lost (i.e., become incorrect) in computing the solution $\mathbf{x}$ . And if $\kappa(\mathbf{A}) > \epsilon_\text{mach}^{-1}$ , then for computational purposes the matrix is effectively singular.

Example 2.8.1 (Matrix condition number)

Julia

MATLAB

Python

Example 2.8.1

Julia has a function cond to compute matrix condition numbers. By default, the 2-norm is used. As an example, the family of Hilbert matrices is famously badly conditioned. Here is the $6\times 6$ case.

A = [ 1 / (i + j) for i in 1:6, j in 1:6 ]
κ = cond(A)

5.109816297946132e7

Because $\kappa\approx 10^8$ , it’s possible to lose nearly 8 digits of accuracy in the process of passing from $\mathbf{A}$ and $\mathbf{b}$ to $\mathbf{x}$ . That fact is independent of the algorithm; it’s inevitable once the data are expressed in finite precision.

Let’s engineer a linear system problem to observe the effect of a perturbation. We will make sure we know the exact answer.

x = 1:6
b = A * x

6-element Vector{Float64}:
 4.407142857142857
 3.564285714285714
 3.013095238095238
 2.6174603174603175
 2.317279942279942
 2.0807359307359308

Now we perturb the system matrix and vector randomly by 10^-10 in norm.

# type \Delta then Tab to get Δ
ΔA = randn(size(A));  ΔA = 1e-10 * (ΔA / opnorm(ΔA));
Δb = randn(size(b));  Δb = 1e-10 * normalize(Δb);

We solve the perturbed problem using pivoted LU and see how the solution was changed.

new_x = ((A + ΔA) \ (b + Δb))
Δx = new_x - x

6-element Vector{Float64}:
  1.221536697793013e-6
 -6.139316571562858e-6
 -2.0607307068765124e-5
  0.0001307569174295864
 -0.0001931264817560674
  8.83661063690866e-5

Here is the relative error in the solution.

@show relative_error = norm(Δx) / norm(x);

relative_error = norm(Δx) / norm(x) = 2.624224063874282e-5

And here are upper bounds predicted using the condition number of the original matrix.

println("Upper bound due to b: $(κ * norm(Δb) / norm(b))")
println("Upper bound due to A: $(κ * opnorm(ΔA) / opnorm(A))")

Upper bound due to b: 0.0006723667714371329
Upper bound due to A: 0.004566989873939067

Even if we didn’t make any manual perturbations to the data, machine roundoff does so at the relative level of $\macheps$ .

Δx = A\b - x
@show relative_error = norm(Δx) / norm(x);
@show rounding_bound = κ * eps();

relative_error = norm(Δx) / norm(x) = 7.822650774976615e-10
rounding_bound = κ * eps() = 1.134607141116935e-8

Larger Hilbert matrices are even more poorly conditioned:

A = [ 1 / (i + j) for i=1:14, j=1:14 ];
κ = cond(A)

5.802584125151949e17

Note that κ exceeds $1/\macheps$ . In principle we therefore may end up with an answer that has relative error greater than 100%.

rounding_bound = κ*eps()

128.8432499613623

Let’s put that prediction to the test.

x = 1:14
b = A * x  
Δx = A\b - x
@show relative_error = norm(Δx) / norm(x);

relative_error = norm(Δx) / norm(x) = 4.469466154206132

As anticipated, the solution has zero accurate digits in the 2-norm.

Example 2.8.1

MATLAB has a function cond to compute matrix condition numbers. By default, the 2-norm is used. As an example, the family of Hilbert matrices is famously badly conditioned. Here is the $6\times 6$ case.

A = hilb(6)
kappa = cond(A)

Let’s engineer a linear system problem to observe the effect of a perturbation. We will make sure we know the exact answer.

x = (1:6)';
b = A * x;

Now we perturb the system matrix and vector randomly by 10^-10 in norm.

dA = randn(size(A));  dA = 1e-10 * (dA / norm(dA));
db = randn(size(b));  db = 1e-10 * (db / norm(db));

We solve the perturbed problem using pivoted LU and see how the solution was changed.

new_x = ((A + dA) \ (b + db));
dx = new_x - x;

Here is the relative error in the solution.

relative_error = norm(dx) / norm(x)

And here are upper bounds predicted using the condition number of the original matrix.

upper_bound_b = (kappa * norm(db) / norm(b))
upper_bound_A = (kappa * norm(dA) / norm(A))

Even if we didn’t make any manual perturbations to the data, machine roundoff does so at the relative level of $\macheps$ .

dx = A\b - x;
relative_error = norm(dx) / norm(x)
rounding_bound = kappa * eps

Larger Hilbert matrices are even more poorly conditioned:

A = hilb(14);
kappa = cond(A)

Note that κ exceeds $1/\macheps$ . In principle we therefore may end up with an answer that has relative error greater than 100%.

rounding_bound = kappa * eps

Let’s put that prediction to the test.

x = (1:14)';  b = A * x;
dx = A\b - x;
relative_error = norm(dx) / norm(x)

As anticipated, the solution has zero accurate digits in the 2-norm.

Example 2.8.1

The function cond from numpy.linalg is used to computes matrix condition numbers. By default, the 2-norm is used. As an example, the family of Hilbert matrices is famously badly conditioned. Here is the $6\times 6$ case.

A = array([ 
    [1/(i + j + 2) for j in range(6)] 
    for i in range(6) 
    ])
print(A)

[[0.5        0.33333333 0.25       0.2        0.16666667 0.14285714]
 [0.33333333 0.25       0.2        0.16666667 0.14285714 0.125     ]
 [0.25       0.2        0.16666667 0.14285714 0.125      0.11111111]
 [0.2        0.16666667 0.14285714 0.125      0.11111111 0.1       ]
 [0.16666667 0.14285714 0.125      0.11111111 0.1        0.09090909]
 [0.14285714 0.125      0.11111111 0.1        0.09090909 0.08333333]]

from numpy.linalg import cond
kappa = cond(A)
print(f"kappa is {kappa:.3e}")

kappa is 5.110e+07

Next we engineer a linear system problem to which we know the exact answer.

x_exact = 1.0 + arange(6)
b = A @ x_exact

Now we perturb the data randomly with a vector of norm 10^-12.

dA = random.randn(6, 6)
dA = 1e-12 * (dA / norm(dA, 2))
db = random.randn(6)
db = 1e-12 * (db / norm(db, 2))

We solve the perturbed problem using built-in pivoted LU and see how the solution was changed.

x = linalg.solve(A + dA, b + db) 
dx = x - x_exact

Here is the relative error in the solution.

print(f"relative error is {norm(dx) / norm(x_exact):.2e}")

relative error is 2.56e-06

And here are upper bounds predicted using the condition number of the original matrix.

print(f"b_bound: {kappa * 1e-12 / norm(b):.2e}")
print(f"A_bound: {kappa * 1e-12 / norm(A, 2):.2e}")

b_bound: 6.72e-06
A_bound: 4.57e-05

Even if we don’t make any manual perturbations to the data, machine epsilon does when we solve the linear system numerically.

x = linalg.solve(A, b)
print(f"relative error: {norm(x - x_exact) / norm(x_exact):.2e}")
print(f"rounding bound: {kappa / 2**52:.2e}")

relative error: 6.85e-10
rounding bound: 1.13e-08

Because $\kappa\approx 10^8$ , it’s possible to lose 8 digits of accuracy in the process of passing from $A$ and $b$ to $x$ . That’s independent of the algorithm; it’s inevitable once the data are expressed in double precision.

Larger Hilbert matrices are even more poorly conditioned.

A = array([ [1/(i+j+2) for j in range(14)] for i in range(14) ])
kappa = cond(A)
print(f"kappa is {kappa:.3e}")

kappa is 1.436e+19

Before we compute the solution, note that κ exceeds 1/eps. In principle we therefore might end up with an answer that is completely wrong (i.e., a relative error greater than 100%).

print(f"rounding bound: {kappa / 2**52:.2e}")

rounding bound: 3.19e+03

x_exact = 1.0 + arange(14)
b = A @ x_exact  
x = linalg.solve(A, b)

We got an answer. But in fact, the error does exceed 100%:

print(f"relative error: {norm(x - x_exact) / norm(x_exact):.2e}")

relative error: 1.42e+01

2.8.2Residual and backward error¶

Suppose that $\mathbf{A}\mathbf{x}=\mathbf{b}$ and $\tilde{\mathbf{x}}$ is a computed estimate of the solution $\mathbf{x}$ . The most natural quantity to study is the error, $\mathbf{x}-\tilde{\mathbf{x}}$ . Normally we can’t compute it because we don’t know the exact solution. However, we can compute something related.

Obviously, a zero residual means that $\tilde{\mathbf{x}}=\mathbf{x}$ , and we have the exact solution. What happens more generally? Note that $\mathbf{A}\tilde{\mathbf{x}}=\mathbf{b}-\mathbf{r}$ . That is, $\tilde{\mathbf{x}}$ solves the linear system problem for a right-hand side that is changed by $-\mathbf{r}$ . This is precisely what is meant by backward error.

Hence residual and backward error are the same thing for a linear system. What is the connection to the (forward) error? We can reconnect with (2.8.6) by the definition $\mathbf{h} = \tilde{\mathbf{x}}-\mathbf{x}$ , in which case

\mathbf{d} = \mathbf{A}(\mathbf{x}+\mathbf{h})-\mathbf{b}=\mathbf{A}\mathbf{h} = -\mathbf{r}.

(2.8.10)

Thus (2.8.6) is equivalent to

\frac{\| \mathbf{x}-\tilde{\mathbf{x}} \|}{\| \mathbf{x} \|} \le \kappa(\mathbf{A}) \frac{\| \mathbf{r} \|}{\| \mathbf{b} \|}.

(2.8.11)

Equation (2.8.11) says that the gap between relative error and the relative residual is a multiplication by the matrix condition number.

2.8.3Exercises¶

⌨ Refer to Demo 2.8.1 for the definition of a Hilbert matrix. Make a table of the values of $\kappa(\mathbf{H}_n)$ in the 2-norm for $n=2,3,\ldots,16$ . Speculate as to why the growth of κ appears to slow down at $n=13$ .
⌨ The purpose of this problem is to verify, like in Demo 2.8.1, the error bound
$\frac{\| \mathbf{x}-\tilde{\mathbf{x} \|}}{\| \mathbf{x} \|} \le \kappa(\mathbf{A}) \frac{\| \mathbf{h} \|}{\| \mathbf{b} \|}.$
(2.8.12)
Here $\tilde{\mathbf{x}}$ is a numerical approximation to the exact solution $\mathbf{x}$ , and $\mathbf{h}$ is an unknown perturbation caused by machine roundoff. We will assume that $\| \mathbf{d} \|/\| \mathbf{b} \|$ is roughly eps().
For each $n=10,20,\ldots,70$ let A = matrixdepot("prolate",n,0.4) and let $\mathbf{x}$ have components $x_k=k/n$ for $k=1,\ldots,n$ . Define b=A*x and let $\tilde{\mathbf{x}}$ be the solution produced numerically by backslash.
Make a table including columns for $n$ , the condition number of $\mathbf{A}$ , the observed relative error in $\tilde{\mathbf{x}}$ , and the right-hand side of the inequality above. You should find that the inequality holds in every case.
⌨ Exercise 2.3.7 suggests that the solutions of linear systems
$\mathbf{A} = \begin{bmatrix} 1 & -1 & 0 & \alpha-\beta & \beta \\ 0 & 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} \alpha \\ 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}$
(2.8.13)
become less accurate as β increases. Using $\alpha=0.1$ and $\beta=10,100,\ldots,10^{12}$ , make a table with columns for β, $|x_1-1|$ , and the condition number of the matrix.
⌨ Let $\mathbf{A}_n$ denote the $(n+1)\times(n+1)$ version of the Vandermonde matrix in Equation (2.1.3) based on the equally spaced interpolation nodes $t_i=i/n$ for $i=0,\ldots,n$ . Using the 1-norm, graph $\kappa(\mathbf{A}_n)$ as a function of $n$ for $n=4,5,6,\ldots,20$ , using a log scale on the $y$ -axis. (The graph is nearly a straight line.)
⌨ The matrix $\mathbf{A}$ in (2.6.2) has unpivoted LU factors given in (2.6.3) as a function of parameter ε. For $\epsilon = 10^{-2},10^{-4},\ldots,10^{-10}$ , make a table with columns for ε, $\kappa(\mathbf{A})$ , $\kappa(\mathbf{L})$ , and $\kappa(\mathbf{U})$ . (This shows that solution via unpivoted LU factorization is arbitrarily unstable.)
✍ Define $\mathbf{A}_n$ as the $n\times n$ matrix $\displaystyle\begin{bmatrix} 1 & -2 & & &\\ & 1 & -2 & & \\ & & \ddots & \ddots & \\ & & & 1 & -2 \\ & & & & 1 \end{bmatrix}.$
(a) Write out $\mathbf{A}_2^{-1}$ and $\mathbf{A}_3^{-1}$ .
(b) Write out $\mathbf{A}_n^{-1}$ in the general case $n>1$ . (If necessary, look at a few more cases in Julia until you are certain of the pattern.) Make a clear argument why it is correct.
(c) Using the ∞-norm, find $\kappa(\mathbf{A}_n)$ .
✍ (a) Prove that for $n\times n$ nonsingular matrices $\mathbf{A}$ and $\mathbf{B}$ , $\kappa(\mathbf{A}\mathbf{B})\le \kappa(\mathbf{A})\kappa(\mathbf{B})$ .
(b) Show by means of an example that the result of part (a) cannot be an equality in general.
✍ Let $\mathbf{D}$ be a diagonal $n\times n$ matrix, not necessarily invertible. Prove that in the 1-norm,
$\kappa(\mathbf{D}) = \frac{\max_i |D_{ii}|}{\min_i |D_{ii}|}.$
(2.8.14)
(Hint: See Exercise 2.7.10.)

Preface

Vector and matrix norms

Preface

Exploiting matrix structure