Problems and conditioning - Fundamentals of Numerical Computation

Let’s think a bit about what must be the easiest math problem you’ve dealt with in quite some time: adding 1 to a number. Formally, we describe this problem as a function $f(x)=x+1$ , where $x$ is any real number.

On a computer, $x$ will be represented by its floating-point counterpart, $\fl(x)$ . Given the property (1.1.5), we have $\fl(x)=x(1+\epsilon)$ for some ε satisfying $|\epsilon| < \macheps/2$ . There is no error in representing the value 1.

Let’s suppose that we are fortunate and that the addition proceeds exactly, with no additional errors. Then the machine result is just

{y} = x(1+\epsilon)+1.

(1.2.1)

We can derive the relative error in this result:

\frac{ |{y}-f(x)| }{ |f(x)| } = \frac{ |(x+\epsilon x+1) - (x+1)| }{ |x+1| } = \frac{ |\epsilon x| }{ |x+1| } .

(1.2.2)

This error could be quite large if the denominator is small. In fact, we can make the relative error as large as we please by taking $x$ very close to -1. This is essentially what happened in Demo 1.1.4.

You may have encountered this situation before when using significant digits for scientific calculations. Suppose we round all results to five decimal digits, and we add -1.0012 to 1.0000. The result is -0.0012, or $-1.2\times 10^{-3}$ in scientific notation. Notice that even though both operands are specified to five digits, it makes no sense to write more than two digits in the answer because there is no information in the problem beyond their decimal places.

This phenomenon is known as subtractive cancellation, or loss of significance. We may say that three digits were “lost” in the mapping from -1.0012 to -0.0012. There’s no way the loss could be avoided, regardless of the algorithm, once we decided to round off everything to a fixed number of digits.

In double precision, all the values are represented to about 16 significant decimal digits of precision, but it’s understood that subtractive cancellation may render some of those digits essentially meaningless.

1.2.1Condition numbers¶

Now we consider problems more generally. As above, we represent a problem as a function $f$ that maps a real data value $x$ to a real result $f(x)$ . We abbreviate this situation by the notation $f:\real \mapsto \real$ , where $\real$ represents the real number set.

When the problem $f$ is approximated in floating point on a computer, the data $x$ is represented as a floating-point value $\tilde{x}=\fl(x)$ . Ignoring all other sources of error, we define the quantitative measure

\frac{ \vphantom{\dfrac{\bigl|}{\bigl|}}\dfrac{|f(x)-f(\tilde{x})|}{|f(x)|} }{% \vphantom{\dfrac{\bigl|}{\bigl|}}\dfrac{|x-\tilde{x}|}{|x|} },

(1.2.3)

which is the ratio of the relative changes in result and data. We make this expression more convenient if we recall that floating-point arithmetic gives $\tilde{x}=x(1+\epsilon)$ for some value $|\epsilon|\le \macheps/2$ . Hence

\dfrac{\left|f(x)-f(x+\epsilon x)\right| } {|\epsilon f(x)|}.

(1.2.4)

Finally, we idealize what happens in a perfect computer by taking a limit as $\macheps\to 0$ .

The condition number is a ratio of the relative error of the output to the relative error of the input. It depends only on the problem and the data, not the computer or the algorithm.

Assuming that $f$ has at least one continuous derivative, we can simplify the expression (1.2.5) through some straightforward manipulations:

\begin{split} \kappa_f(x) &= \lim_{\epsilon\to 0} \left| \dfrac{ f(x+\epsilon x) - f(x) }{ \epsilon f(x)} \right| \\ &= \lim_{\epsilon\to 0} \left| \dfrac{ f(x+\epsilon x) - f(x) }{ \epsilon x} \cdot \frac{x}{f(x)} \right| \\ &= \left| \dfrac{ x f'(x)} {f(x)} \right|. \end{split}

(1.2.6)

In retrospect, it should come as no surprise that the change in values of $f(x)$ due to small changes in $x$ involves the derivative of $f$ . In fact, if we were making measurements of changes in absolute rather than relative terms, the condition number would be simply $|f'(x)|$ .

Condition numbers of the major elementary functions are given in Table 1.2.1.

Table 1.2.1:Relative condition numbers of elementary functions

Function	Condition number
$f(x) = x + c$	$\kappa_f(x) = \dfrac{\lvert x \rvert}{\lvert x+c\rvert}$
$f(x) = cx$	$\kappa_f(x) = 1$
$f(x) = x^p$	$\kappa_f(x) = \lvert p \rvert$
$f(x) = e^x$	$\kappa_f(x) = \lvert x \rvert$
$f(x) = \sin(x)$	$\kappa_f(x) = \lvert x\cot(x) \rvert$
$f(x) = \cos(x)$	$\kappa_f(x) = \lvert x\tan(x) \rvert$
$f(x) = \log(x)$	$\kappa_f(x) = \dfrac{1}{\lvert \log(x) \rvert}$

As you are asked to show in Exercise 4, when two functions $f$ and $g$ are combined in a chain as $h(x)=f(g(x))$ , the composite condition number is

\kappa_h(x) = \kappa_f(g(x)) \cdot \kappa_g(x).

(1.2.9)

1.2.2Estimating errors¶

Refer back to the definition of $\kappa_f$ as a limit in (1.2.5). Approximately speaking, if $|\epsilon|$ is small, we expect

\left| \dfrac{ f(x+\epsilon x) - f(x) }{ f(x)} \right| \approx \kappa_f(x)\, |\epsilon|.

(1.2.10)

That is, whenever the data $x$ is perturbed by a small amount, we expect that relative perturbation to be magnified by a factor of $\kappa_f(x)$ in the result.

Large condition numbers signal when errors cannot be expected to remain comparable in size to roundoff error. We call a problem poorly conditioned or ill-conditioned when $\kappa_f(x)$ is large, although there is no fixed threshold for the term.

If $\kappa_f \approx 1/\macheps$ , then we can expect the result to have a relative error of as much as 100% simply by expressing the data $x$ in finite precision. Such a function is essentially not computable at this machine epsilon.

You may have noticed that for some functions, such as the square root, the condition number can be less than 1. This means that relative changes get smaller in the passage from input to output. However, every result in floating-point arithmetic is still subject to rounding error at the relative level of $\macheps$ . In practice, $\kappa_f<1$ is no different from $\kappa_f=1$ .

1.2.3Polynomial roots¶

Most problems have multiple input and output values. These introduce complications into the formal definition of the condition number. Rather than worry over those details here, we can still look at variations in only one output value with respect to one data value at a time.

Example 1.2.4

Consider the problem of finding the roots of a quadratic polynomial; that is, the values of $t$ for which $at^2+bt+c=0$ . Here the data are the coefficients $a$ , $b$ , and $c$ that define the polynomial, and the solution to the problem are the two (maybe complex-valued) roots $r_1$ and $r_2$ . Formally, we might write $f([a,b,c])=[r_1,r_2]$ using vector notation.

Let’s pick one root $r_1$ and consider what happens to it as we vary just the leading coefficient $a$ . This suggests a scalar function $f(a)=r_1$ . Starting from $ar_1^2 + br_1 + c = 0$ , we differentiate implicitly with respect to $a$ while holding $b$ and $c$ fixed:

r_1^2 + 2a r_1 \left(\frac{dr_1}{da}\right) + b \,\frac{dr_1}{da} = 0.

(1.2.11)

Solving for the derivative, we obtain

\frac{dr_1}{da} = \frac{-r_1^2}{2a r_1 + b}.

(1.2.12)

Hence the condition number for the problem $f(a)=r_1$ is

\kappa_f(a) = \left|\frac{a}{r_1} \cdot \frac{dr_1}{da} \right| = \left| \frac{a r_1}{ 2a r_1 + b} \right| = \left| \frac{r_1}{ r_1-r_2} \right|,

(1.2.13)

where in the last step we used the quadratic formula:

|2ar_1 + b | = \left| \sqrt{b^2-4ac} \, \right| = |a(r_1 - r_2)|.

(1.2.14)

Based on (1.2.13), we can expect poor conditioning in the rootfinding problem if and only if $|r_1| \gg |r_1-r_2|$ . Similar conclusions apply for $r_2$ and for variations with respect to the coefficients $b$ and $c$ .

The calculation in Example 1.2.4 generalizes to polynomials of higher degree.

The condition number of a root can be arbitrarily large. In the extreme case of a repeated root, the condition number is formally infinite, which implies that the ratio of changes in the root to changes in the coefficients cannot be bounded.

Example 1.2.5 (Conditioning of polynomial roots)

Julia

MATLAB

Python

Example 1.2.5

The polynomial $p(x) = \frac{1}{3}(x-1)(x-1-\epsilon)$ has roots 1 and $1+\epsilon$ . For small values of ε, the roots are ill-conditioned.

ϵ = 1e-6   # type \epsilon and then press Tab
a,b,c = 1/3,(-2-ϵ)/3,(1+ϵ)/3   # coefficients of p

(0.3333333333333333, -0.666667, 0.33333366666666664)

Here are the roots as computed by the quadratic formula.

d = sqrt(b^2 - 4a*c)
r₁ = (-b - d) / (2a)   # type r\_1 and then press Tab
r₂ = (-b + d) / (2a)
(r₁, r₂)

(0.9999999998251499, 1.0000010001748503)

The relative errors in these values are

@show abs(r₁ - 1) / abs(1);
@show abs(r₂ - (1+ϵ)) / abs(1+ϵ);

abs(r₁ - 1) / abs(1) = 1.7485013437124053e-10
abs(r₂ - (1 + ϵ)) / abs(1 + ϵ) = 1.748501815656639e-10

The condition number of each root is

\kappa(r_i) = \frac{|r_i|}{|r_1-r_2|} \approx \frac{1}{\epsilon}.

(1)

Thus, relative error in the data at the level of roundoff can grow in the result to be roughly

eps() / ϵ

2.220446049250313e-10

This matches the observation pretty well.

Example 1.2.5

The polynomial $p(x) = \frac{1}{3}(x-1)(x-1-\epsilon)$ has roots 1 and $1+\epsilon$ . For small values of ε, the roots are ill-conditioned.

ep = 1e-6;
a = 1/3;             % coefficients of p...
b = (-2 - ep) / 3;   % ...
c = (1 + ep) / 3;    % ...in ascending order

Here are the roots as computed by the quadratic formula.

d = sqrt(b^2 - 4*a*c);
format long   % show all digits
r1 = (-b - d) / (2*a)
r2 = (-b + d) / (2*a)

The display of r2 suggests that the last five digits or so are inaccurate. The relative errors are

format short e
err = abs(r1 - 1) ./ abs(1)
err = abs(r2 - (1 + ep)) ./ abs(1 + ep)

The condition number of each root is

\kappa(r_i) = \frac{|r_i|}{|r_1-r_2|} \approx \frac{1}{\epsilon}.

(1)

Thus, relative error in the data at the level of roundoff can grow in the result to be roughly

eps / ep

This matches the observation pretty well.

Example 1.2.5

The polynomial $p(x) = \frac{1}{3}(x-1)(x-1-\epsilon)$ has roots 1 and $1+\epsilon$ . For small values of ε, the roots are ill-conditioned.

ep = 1e-6   
a, b, c = 1/3, (-2 - ep) / 3, (1 + ep) / 3   # coefficients of p

Here are the roots as computed by the quadratic formula.

d = sqrt(b**2 - 4*a*c)
r1 = (-b - d) / (2*a)
r2 = (-b + d) / (2*a)
print(r1, r2)

0.9999999998251499 1.0000010001748503

The display of r2 suggests that the last five digits or so are inaccurate. The relative error in the value is

print(abs(r1 - 1) / abs(1))
print(abs(r2 - (1 + ep)) / abs(1 + ep))

1.7485013437124053e-10
1.748501815656639e-10

The condition number of each root is

\kappa(r_i) = \frac{|r_i|}{|r_1-r_2|} \approx \frac{1}{\epsilon}.

(1)

Thus, relative error in the data at the level of roundoff can grow in the result to be roughly

print(finfo(float).eps / ep)

2.220446049250313e-10

This matches the observation pretty well.

1.2.4Exercises¶

✍ Use (1.2.6) to derive the relative condition numbers of the following functions appearing in Table 1.2.1.
(a) $f(x) = x^p,\quad$ (b) $f(x) = \log(x),\quad$ (c) $f(x) = \cos(x),\quad$ (d) $f(x) = e^x$ .
✍ Use the chain rule (1.2.9) to find the relative condition number of the given function. Then check your result by applying (1.2.6) directly.
(a) $f(x) = \sqrt{x+5},\quad$ (b) $f(x) = \cos(2\pi x),\quad$ (c) $f(x) = e^{-x^2}.$
✍ Calculate the relative condition number of each function, and identify all values of $x$ at which $\kappa_{f}(x)\to\infty$ (including limits as $x\to\pm\infty$ ).
(a) $f(x) = \tanh(x),\quad$ (b) $f(x) = \dfrac{e^x-1}{x},\quad$ (c) $f(x) = \dfrac{1-\cos(x)}{x}.$

✍ Suppose that $f$ and $g$ are real-valued functions that have relative condition numbers $\kappa_f$ and $\kappa_g$ , respectively. Define a new function $h(x)=f\bigl(g(x)\bigr)$ . Show that for $x$ in the domain of $h$ , the relative condition number of $h$ satisfies (1.2.9).
✍ Suppose that $f$ is a function with relative condition number $\kappa_f$ , and that $f^{-1}$ is its inverse function. Show that the relative condition number of $f^{-1}$ satisfies
$\kappa_{f^{-1}}(x) = \frac{1}{\kappa_f\Bigl( f^{-1}(x) \Bigr)},$
(1.2.15)
provided the denominator is nonzero.

✍ Referring to the derivation of (1.2.13), derive an expression for the relative condition number of a root of $ax^2+bx+c=0$ due to perturbations in $b$ only.
The polynomial $x^2-2x+1$ has a double root at 1. Let $r_1(\epsilon)$ and $r_2(\epsilon)$ be the roots of the perturbed polynomial $x^2-(2+\epsilon)x+1$ .
(a) ✍/⌨ Using a computer or calculator, make a table with rows for $\epsilon = 10^{-2}$ , 10^-4, 10^-6, $\ldots$ , 10^-12 and columns for ε, $r_1(\epsilon)$ , $r_2(\epsilon)$ , $|r_1(\epsilon)-1|$ , and $|r_2(\epsilon)-1|$ .
(b) ✍ Show that the observations of part (a) satisfy
$\max\{\, |r_1(\epsilon)-1|, |r_2(\epsilon)-1| \,\} \approx C \epsilon^q$
(1.2.16)
for some $0<q<1$ . (This supports the conclusion that $\kappa=\infty$ at the double root.)
✍ Generalize (1.2.13) to finding a root of the $n$ th degree polynomial $p(x) = a_nx^n + \cdots + a_1 x + a_0$ , and show that the relative condition number of a root $r$ with respect to perturbations only in $a_k$ is
$\kappa_r(a_k) = \left| \frac{a_k r^{k-1}}{p'(r)} \right|.$
(1.2.17)

Preface

Floating-point numbers

Preface

Algorithms