The rootfinding problem - Fundamentals of Numerical Computation

For the time being we will focus on the rootfinding problem for single functions of one variable.

The formulation $f(x)=0$ is general enough to solve any equation. If we are trying to solve an equation $g(x)=h(x)$ , we can define $f=g-h$ and find a root of $f$ .

Unlike the linear problems of the earlier chapters, the usual situation here is that the root cannot be produced in a finite number of operations, even in exact arithmetic. Instead, we seek a sequence of approximations that formally converge to the root, stopping when some member of the sequence seems to be good enough, in a sense we will clarify later.

Example 4.1.1 (The rootfinding problem for Bessel functions)

In the theory of vibrations of a circular drum, the displacement of the drumhead can be expressed in terms of pure harmonic modes,

J_m(\omega_{k,m} r) \cos(m\theta) \cos(c \omega_{k,m} t),

(4.1.1)

where $(r,\theta)$ are polar coordinates, $0\le r\le 1$ , $t$ is time, $m$ is a positive integer, $c$ is a material parameter, and $J_m$ is a Bessel function of the first kind. The quantity $\omega_{k,m}$ is a resonant frequency and is a positive root of the equation

J_m(\omega_{k,m}) = 0,

(4.1.2)

which states that the drumhead is clamped around the rim. Bessel functions often appear in physical problems featuring radial symmetry, and tabulating approximations to the zeros of Bessel functions occupied numerous mathematician-hours before computers were on the scene.

Julia

MATLAB

Python

Example 4.1.1

using Plots, SpecialFunctions
J₃(x) = besselj(3, x)
plot(J₃, 0, 20;
    title="Bessel function",
    xaxis=(L"x"),  yaxis=(L"J_3(x)"),  grid=:xy)

From the graph we see roots near 6, 10, 13, 16, and 19. We use nlsolve from the NLsolve package to find these roots accurately. It uses vector variables, so we have to code accordingly.

using NLsolve
ω = []
for guess = [6., 10. ,13., 16., 19.]
    s = nlsolve(x -> J₃(x[1]), [guess], ftol=1e-14)
    append!(ω, s.zero)
end

y = J₃.(ω)
@pt :header=["root estimate", "function value"] [ω y]

scatter!(ω, y, title="Bessel function with roots")

If instead we seek values at which $J_3(x)=0.2$ , then we must find roots of the function $J_3(x)-0.2$ .

r = []
for guess = [3., 6., 10., 13.]
    f(x) = J₃(x[1]) - 0.2
    s = nlsolve(f, [guess], ftol=1e-14)
    append!(r, s.zero)
end
scatter!(r, J₃.(r), title="Roots and other Bessel values")

Example 4.1.1

J3 = @(x) besselj(3,x);
fplot(J3, [0, 20])
grid on
xlabel('x'), ylabel('J_3(x)')  
title('Bessel function')

From the graph we see roots near 6, 10, 13, 16, and 19. We use nlsolve from the NLsolve package to find these roots accurately. It uses vector variables, so we have to code accordingly.

omega = [];
for guess = [6, 10, 13, 16, 19]
    omega = [omega; fzero(J3, guess)];
end
omega

table(omega, J3(omega), 'VariableNames', {'root estimate', 'function value'})

hold on
scatter(omega, J3(omega))
title('Bessel roots')

If instead we seek values at which $J_3(x)=0.2$ , then we must find roots of the function $J_3(x)-0.2$ .

omega = [];
for guess = [3, 6, 10, 13]
    f = @(x) J3(x) - 0.2;
    omega = [omega; fzero(f, guess)];
end
scatter(omega, J3(omega), '<')

Example 4.1.1

import scipy.special as special
def J3(x):
    return special.jv(3.0, x)

xx = linspace(0, 20, 500)
fig, ax = subplots()
ax.plot(xx, J3(xx))
ax.grid()
xlabel("$x$"), ylabel("$J_3(x)$")
title("Bessel function");

From the graph we see roots near 6, 10, 13, 16, and 19. We use root_scalar from the scipy.optimize package to find these roots accurately.

from scipy.optimize import root_scalar

omega = []
for guess in [6.0, 10.0, 13.0, 16.0, 19.0]:
    s = root_scalar(J3, bracket=[guess - 0.5, guess + 0.5]).root
    omega.append(s)

results = PrettyTable()
results.add_column("root estimate", omega)
results.add_column("function value", [J3(ω) for ω in omega])
print(results)

+--------------------+-------------------------+
|   root estimate    |      function value     |
+--------------------+-------------------------+
| 6.380161895923983  |  2.5877058405892666e-16 |
| 9.761023129981195  | -1.1824718824668837e-13 |
| 13.015200721698434 |  -1.135864957485476e-16 |
| 16.223466160318758 |  -1.898385896228919e-15 |
| 19.40941522643501  |  7.862906169587308e-16  |
+--------------------+-------------------------+

ax.scatter(omega, J3(omega))
ax.set_title("Bessel function roots")
fig

If instead we seek values at which $J_3(x)=0.2$ , then we must find roots of the function $J_3(x)-0.2$ .

omega = []
for guess in [3., 6., 10., 13.]:
    f = lambda x: J3(x) - 0.2
    s = root_scalar(f, x0=guess).root
    omega.append(s)

ax.scatter(omega, J3(omega))
fig

4.1.1Conditioning, error, and residual¶

In the rootfinding problem, the data is a continuous function $f$ and the result is a root. (This overrides our Chapter 1 notation of $f$ as the map from data to result.) How does the result change in response to perturbations in $f$ ? We will compute an absolute condition number rather than a relative one.

You might wonder about the relevance of perturbing a function as data to a problem. If nothing else, the values of $f$ will be represented in floating point and thus subject to rounding error. Furthermore, in many applications, $f$ might not be a simple formula but the result of a computation that uses an inexact algorithm. While there are infinitely many possible perturbations to a function, a constant perturbation is enough to get the main idea.

We assume $f$ has at least one continuous derivative near a particular root $r$ . Suppose that $f$ is perturbed to $\tilde{f}(x) = f(x) + \epsilon$ . As a result, the root (if it still exists) will be perturbed to $\tilde{r} = r + \delta$ such that $\tilde{f}(\tilde{r})=0$ . We now compute an absolute condition number $\kappa_r$ , which is the ratio $\left | \frac{\delta}{\epsilon} \right|$ as $\epsilon\to 0$ .

Using Taylor’s theorem,

0 = f(r+\delta) + \epsilon \approx f(r) + f'(r) \delta + \epsilon.

(4.1.3)

Since $r$ is a root, we have $f(r)=0$ . This lets us relate δ to ε, and their ratio is the condition number.

Equivalently, (4.1.4) is just the magnitude of the derivative of the inverse function $f^{-1}$ at zero.

Example 4.1.2 (Condition number of a rootfinding problem)

Julia

MATLAB

Python

Example 4.1.2

Consider first the function

f(x) = (x - 1) * (x - 2);

At the root $r=1$ , we have $f'(r)=-1$ . If the values of $f$ were perturbed at every point by a small amount of noise, we can imagine finding the root of the function drawn with a thick ribbon, giving a range of potential roots.

interval = [0.8, 1.2]

plot(f, interval..., ribbon=0.03, aspect_ratio=1,
    xlabel=L"x", yaxis=(L"f(x)", [-0.2, 0.2]))

scatter!([1], [0], title="Well-conditioned root")

The possible values for a perturbed root all lie within the interval where the ribbon intersects the $x$ -axis. The width of that zone is about the same as the vertical thickness of the ribbon.

By contrast, consider the function

f(x) = (x - 1) * (x - 1.01);

Now $f'(1)=-0.01$ , and the graph of $f$ will be much shallower near $x=1$ . Look at the effect this has on our thick rendering:

plot(f, interval..., ribbon=0.03, aspect_ratio=1,
    xlabel=L"x", yaxis=(L"f(x)", [-0.2, 0.2]))

scatter!([1], [0], title="Poorly-conditioned root")

The vertical displacements in this picture are exactly the same as before. But the potential horizontal displacement of the root is much wider. In fact, if we perturb the function entirely upward by the amount drawn here, the root disappears!

Example 4.1.2

Consider first the function

f  = @(x) (x - 1) .* (x - 2);

clf
interval = [0.8, 1.2];
fplot(f, interval)
grid on, hold on
fplot(@(x) f(x) + 0.02, interval, 'k')
fplot(@(x) f(x) - 0.02, interval, 'k')
axis equal,  yticks([0])
ylim([-0.1, 0.1])
xlabel('x'), ylabel('f(x)')
title('Well-conditioned root')

The possible values for a perturbed root all lie within the interval where the ribbon intersects the $x$ -axis. The width of that zone is about the same as the vertical thickness of the ribbon.

By contrast, consider the function

f = @(x) (x - 1) .* (x - 1.01);

Now $f'(1)=-0.01$ , and the graph of $f$ will be much shallower near $x=1$ . Look at the effect this has on our thick rendering:

axis(axis), cla
fplot(f, interval)
fplot(@(x) f(x) + 0.02, interval, 'k')
fplot(@(x) f(x) - 0.02, interval, 'k')
ylim([-0.1, 0.1]), yticks([0])
title('Poorly conditioned root')

Example 4.1.2

Consider first the function

f = lambda x: (x - 1) * (x - 2)

xx = linspace(0.8, 1.2, 400)
plot(xx, f(xx))
plot(xx, f(xx) + 0.02, "k")
plot(xx, f(xx) - 0.02, "k")
axis("equal"), grid(True)
xlabel("x"), ylabel("f(x)")
title("Well-conditioned root");

The possible values for a perturbed root all lie within the interval where the ribbon intersects the $x$ -axis. The width of that zone is about the same as the vertical thickness of the ribbon.

By contrast, consider the function

f = lambda x: (x - 1) * (x - 1.01)

Now $f'(1)=-0.01$ , and the graph of $f$ will be much shallower near $x=1$ . Look at the effect this has on our thick rendering:

xx = linspace(0.8, 1.2, 400)
plot(xx, f(xx))
plot(xx, f(xx) + 0.02, "k")
plot(xx, f(xx) - 0.02, "k")
axis("equal"), grid(True)
xlabel("x"), ylabel("f(x)")
title("Poorly-conditioned root");

We must accept that when $|f'|$ is small at the root, it may not be possible to get a small error in a computed root estimate. As always, the error is not a quantity we can compute without knowing the exact answer. There is something else we can measure, though.

It stands to reason that a small residual might be associated with a small error. To quantify the relationship, let $\tilde{r}$ approximate root $r$ , and define the new function $g(x)=f(x)-f(\tilde{r})$ . Trivially, $g(\tilde{r})=0$ , meaning that $\tilde{r}$ is a true root of $g$ . Since the difference $g(x)-f(x)$ is the residual value $f(\tilde{r})$ , the residual is the distance to an exactly solved rootfinding problem.

In general, it is not realistic to expect a small error in a root approximation if the condition number (4.1.4) is large. However, we can gauge the backward error from the residual.

4.1.2Multiple roots¶

The condition number (4.1.4) naturally leads to the question of what happens if $f'(r)=0$ at a root $r$ . The following definition agrees with and extends the notion of algebraic multiplicity in a polynomial to roots of more general differentiable functions.

Another useful characterization of multiplicity $m$ is that $f(x)=(x-r)^m q(x)$ for a differentiable $q$ with $q(r)\neq 0$ .

When $r$ is a nonsimple root, the condition number (4.1.4) is effectively infinite.^[1] However, even if $r$ is simple, we should expect difficulty in rootfinding if the condition number is very large. This occurs when $|f'(r)|$ is very small, which means that the quotient $q$ satisfies $q(r)\approx 0$ and another root of $f$ is very close to $r$ . We made the same observation about polynomial roots all the way back in Demo 1.4.3.

4.1.3Exercises¶

⌨ For each equation and given interval, do the following steps.

(a) Rewrite the equation into the standard form for rootfinding, $f(x) = 0$ . Make a plot of $f$ over the given interval and determine how many roots lie in the interval.

(b) Use nlsolve to find each root, as shown in Demo 4.1.1.

(c) Compute the condition number of each root found in part (b).

$x^2=e^{-x}$ , over $[-2,2]$
$2x = \tan x$ , over $[-0.2,1.4]$
$e^{x+1}=2+x$ , over $[-2,2]$
⌨ A basic safe type of investment is an annuity: one makes monthly deposits of size $P$ for $n$ months at a fixed annual interest rate $r$ , and at maturity collects the amount
$\frac{12 P}{r} \left( \Bigl(1+\frac{r}{12}\Bigr)^n - 1\right).$
(4.1.5)
Say you want to create an annuity for a term of 300 months and final value of $1,000,000. Using nlsolve, make a table of the interest rate you will need to get for each of the different contribution values $P=500,550,\ldots,1000$ .
⌨ The most easily observed properties of the orbit of a celestial body around the sun are the period τ and the elliptical eccentricity ε. (A circle has $\epsilon=0$ .) From these, it is possible to find at any time $t$ the angle $\theta(t)$ made between the body’s position and the major axis of the ellipse. This is done through
$\tan \frac{\theta}{2} = \sqrt{\frac{1+\epsilon}{1-\epsilon}}\, \tan \frac{\psi}{2},$
(4.1.6)
where the eccentric anomaly $\psi(t)$ satisfies Kepler’s equation:
$\psi - \epsilon \sin \psi - \frac{2\pi t}{\tau} = 0.$
(4.1.7)
Equation (4.1.7) must be solved numerically to find $\psi(t)$ , and then (4.1.6) can be solved analytically to find $\theta(t)$ .
The asteroid Eros has $\tau=1.7610$ years and $\epsilon=0.2230$ . Using nlsolve for (4.1.7), make a plot of $\theta(t)$ for 100 values of $t$ between 0 and τ, which is one full orbit. (Note: Use mod(θ,2π) to put the angle between 0 and $2\pi$ if you want the result to be a continuous function.)
⌨ Lambert’s $W$ function is defined as the inverse of $x e^x$ . That is, $y=W(x)$ if and only if $x=ye^y$ . Write a function lambertW that computes $W$ using nlsolve. Make a plot of $W(x)$ for $0\le x \le 4$ .
✍ For each function, find the multiplicity of the given root. If it is a simple root, find its absolute condition number.
(a) $f(x) = x^3-2x^2+x-2$ , root $r=2$
(b) $f(x) = (\cos x + 1)^2$ , root $r=\pi$
(c) $f(x) = \frac{\sin^2 x}{x}$ , root $r=0$ (define $f(0) =0$ )
(d) $f(x) =(x-1)\log(x)$ , root $r=1$

Footnotes¶

Based on our definitions, this means that the relative change to the root when $f$ is changed by a perturbation of size ε is not $O(\epsilon)$ as $\epsilon\to 0$ .
↩

Preface

4. Roots of nonlinear equations

Preface

Fixed-point iteration