Skip to article frontmatterSkip to article content

The rootfinding problem

For the time being we will focus on the rootfinding problem for single functions of one variable.

The formulation f(x)=0f(x)=0 is general enough to solve any equation. If we are trying to solve an equation g(x)=h(x)g(x)=h(x), we can define f=ghf=g-h and find a root of ff.

Unlike the linear problems of the earlier chapters, the usual situation here is that the root cannot be produced in a finite number of operations, even in exact arithmetic. Instead, we seek a sequence of approximations that formally converge to the root, stopping when some member of the sequence seems to be good enough, in a sense we will clarify later. The NLsolve package for Julia has a function nlsolve for general-purpose rootfinding.

4.1.1Conditioning, error, and residual

In the rootfinding problem, the data is a continuous function ff and the result is a root. (This overrides our Chapter 1 notation of ff as the map from data to result.) How does the result change in response to perturbations in ff? We will compute an absolute condition number rather than a relative one.

You might wonder about the relevance of perturbing a function as data to a problem. If nothing else, the values of ff will be represented in floating point and thus subject to rounding error. Furthermore, in many applications, ff might not be a simple formula but the result of a computation that uses an inexact algorithm. While there are infinitely many possible perturbations to a function, a constant perturbation is enough to get the main idea.

We assume ff has at least one continuous derivative near a particular root rr. Suppose that ff is perturbed to f~(x)=f(x)+ϵ\tilde{f}(x) = f(x) + \epsilon. As a result, the root (if it still exists) will be perturbed to r~=r+δ\tilde{r} = r + \delta such that f~(r~)=0\tilde{f}(\tilde{r})=0. We now compute an absolute condition number κr\kappa_r, which is the ratio δϵ\left | \frac{\delta}{\epsilon} \right| as ϵ0\epsilon\to 0.

Using Taylor’s theorem,

0=f(r+δ)+ϵf(r)+f(r)δ+ϵ. 0 = f(r+\delta) + \epsilon \approx f(r) + f'(r) \delta + \epsilon.

Since rr is a root, we have f(r)=0f(r)=0. This lets us relate δ to ε, and their ratio is the condition number.

Equivalently, (4.1.4) is just the magnitude of the derivative of the inverse function f1f^{-1} at zero.

We must accept that when f|f'| is small at the root, it may not be possible to get a small error in a computed root estimate. As always, the error is not a quantity we can compute without knowing the exact answer. There is something else we can measure, though.

It stands to reason that a small residual might be associated with a small error. To quantify the relationship, let r~\tilde{r} approximate root rr, and define the new function g(x)=f(x)f(r~)g(x)=f(x)-f(\tilde{r}). Trivially, g(r~)=0g(\tilde{r})=0, meaning that r~\tilde{r} is a true root of gg. Since the difference g(x)f(x)g(x)-f(x) is the residual value f(r~)f(\tilde{r}), the residual is the distance to an exactly solved rootfinding problem.

In general, it is not realistic to expect a small error in a root approximation if the condition number (4.1.4) is large. However, we can gauge the backward error from the residual.

4.1.2Multiple roots

The condition number (4.1.4) naturally leads to the question of what happens if f(r)=0f'(r)=0 at a root rr. The following definition agrees with and extends the notion of algebraic multiplicity in a polynomial to roots of more general differentiable functions.

Another useful characterization of multiplicity mm is that f(x)=(xr)mq(x)f(x)=(x-r)^m q(x) for a differentiable qq with q(r)0q(r)\neq 0.

When rr is a nonsimple root, the condition number (4.1.4) is effectively infinite.[1] However, even if rr is simple, we should expect difficulty in rootfinding if the condition number is very large. This occurs when f(r)|f'(r)| is very small, which means that the quotient qq satisfies q(r)0q(r)\approx 0 and another root of ff is very close to rr. We made the same observation about polynomial roots all the way back in Demo 1.4.3.

4.1.3Exercises

⌨ For each equation and given interval, do the following steps.

(a) Rewrite the equation into the standard form for rootfinding, f(x)=0f(x) = 0. Make a plot of ff over the given interval and determine how many roots lie in the interval.

(b) Use nlsolve to find each root, as shown in Demo 4.1.1.

(c) Compute the condition number of each root found in part (b).

  1. x2=exx^2=e^{-x}, over [2,2][-2,2]

  2. 2x=tanx2x = \tan x, over [0.2,1.4][-0.2,1.4]

  3. ex+1=2+xe^{x+1}=2+x, over [2,2][-2,2]


  4. ⌨ A basic safe type of investment is an annuity: one makes monthly deposits of size PP for nn months at a fixed annual interest rate rr, and at maturity collects the amount

    12Pr((1+r12)n1).\frac{12 P}{r} \left( \Bigl(1+\frac{r}{12}\Bigr)^n - 1\right).

    Say you want to create an annuity for a term of 300 months and final value of $1,000,000. Using nlsolve, make a table of the interest rate you will need to get for each of the different contribution values P=500,550,,1000P=500,550,\ldots,1000.

  5. ⌨ The most easily observed properties of the orbit of a celestial body around the sun are the period τ and the elliptical eccentricity ε. (A circle has ϵ=0\epsilon=0.) From these, it is possible to find at any time tt the angle θ(t)\theta(t) made between the body’s position and the major axis of the ellipse. This is done through

    tanθ2=1+ϵ1ϵtanψ2,\tan \frac{\theta}{2} = \sqrt{\frac{1+\epsilon}{1-\epsilon}}\, \tan \frac{\psi}{2},

    where the eccentric anomaly ψ(t)\psi(t) satisfies Kepler’s equation:

    ψϵsinψ2πtτ=0.\psi - \epsilon \sin \psi - \frac{2\pi t}{\tau} = 0.

    Equation (4.1.7) must be solved numerically to find ψ(t)\psi(t), and then (4.1.6) can be solved analytically to find θ(t)\theta(t).

    The asteroid Eros has τ=1.7610\tau=1.7610 years and ϵ=0.2230\epsilon=0.2230. Using nlsolve for (4.1.7), make a plot of θ(t)\theta(t) for 100 values of tt between 0 and τ, which is one full orbit. (Note: Use mod(θ,2π) to put the angle between 0 and 2π2\pi if you want the result to be a continuous function.)

  6. ⌨ Lambert’s WW function is defined as the inverse of xexx e^x. That is, y=W(x)y=W(x) if and only if x=yeyx=ye^y. Write a function lambertW that computes WW using nlsolve. Make a plot of W(x)W(x) for 0x40\le x \le 4.

  7. ✍ For each function, find the multiplicity of the given root. If it is a simple root, find its absolute condition number.

    (a) f(x)=x32x2+x2f(x) = x^3-2x^2+x-2, root r=2r=2

    (b) f(x)=(cosx+1)2f(x) = (\cos x + 1)^2, root r=πr=\pi

    (c) f(x)=sin2xxf(x) = \frac{\sin^2 x}{x}, root r=0r=0 (define f(0)=0f(0) =0)

    (d) f(x)=(x1)log(x)f(x) =(x-1)\log(x), root r=1r=1

Footnotes
  1. Based on our definitions, this means that the relative change to the root when ff is changed by a perturbation of size ε is not O(ϵ)O(\epsilon) as ϵ0\epsilon\to 0.