Newton’s method in one variable

Newton’s method is the cornerstone of rootfinding. We introduce the key idea with an example in The idea of Newton’s method.

Using general notation, if we have a root approximation \(x_k\), we can construct a linear model of \(f(x)\) using the classic formula for the tangent line of a differentiable function,

(81)\[ q(x) = f(x_k) + f'(x_k)(x-x_k).\]

Finding the root of \(q(x)=0\) is trivial. We define the next approximation by the condition \(q(x_{k+1})=0\), or

(82)\[ x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}.\]

Starting with an initial estimate \(x_1\), this formula defines a sequence of estimates \(x_2,x_3,\ldots\). The iteration so defined is what we call Newton’s method.

Quadratic convergence

The graphs of The idea of Newton’s method suggest why the Newton iteration may converge to a root: Any differentiable function looks more and more like its tangent line as we zoom in to the point of tangency. Yet it is far from clear that it must converge, or at what rate it will do so. The matter of the convergence rate is fairly straightforward to resolve. Define the error sequence

(83)\[\epsilon_k = r - x_k, \quad k=1,2,\ldots,\]

where \(r\) is the limit of the sequence and \(f(r)=0\). Exchanging \(x\)-values for \(\epsilon\)-values in (82) gives

\[ \epsilon_{k+1} = \epsilon_k + \frac{f(r-\epsilon_k)}{f'(r-\epsilon_k)}.\]

We assume that \(|\epsilon_k|\to 0\); eventually, the errors remain as small as we please forever. Then a Taylor expansion of \(f\) about \(x=r\) gives

\[ \epsilon_{k+1} = \epsilon_k + \frac{ f(r) - \epsilon_kf'(r) + \frac{1}{2}\epsilon_k^2f''(r) + O(\epsilon_k^3)}{ f'(r) - \epsilon_kf''(r) + O(\epsilon_k^2)}.\]

We use the fact that \(f(r)=0\) and additionally assume now that \(f'(r)\ne 0\). Then

\[\epsilon_{k+1} = \epsilon_k - \epsilon_k \left[ 1 - \dfrac{1}{2}\dfrac{f''(r)}{f'(r)} \epsilon_k + O(\epsilon_k^2)\right] \, \left[ 1 - \dfrac{f''(r)}{f'(r)}\epsilon_k + O(\epsilon_k^2)\right]^{-1}.\]

The series in the denominator is of the form \((1+z)^{-1}\). Provided \(|z|<1\), this is the limit of the geometric series \(1-z+z^2-z^3 + \cdots\). Keeping only the lowest-order terms, we derive

(84)\[\begin{split}\begin{split} \epsilon_{k+1} &= \epsilon_k - \epsilon_k \left[ 1 - \dfrac{1}{2}\dfrac{f''(r)}{f'(r)} \epsilon_k + O(\epsilon_k^2) \right] \, \left[ 1 + \dfrac{f''(r)}{f'(r)} \epsilon_k + O(\epsilon_k^2) \right]\\ &= -\frac{1}{2}\, \frac{f''(r)}{f'(r)} \epsilon_k^2 + O(\epsilon_k^3). \end{split}\end{split}\]

Equation (84) suggests that eventually, each iteration of Newton’s method roughly squares the error. This behavior is called quadratic convergence. The formal definition of quadratic convergence is that there exists a number \(\alpha>0\) such that

(85)\[ \lim_{k\to\infty} \frac{|x_{k+1}-r|}{|x_k-r|^2} = \alpha.\]

Recall that linear convergence is identifiable by trending toward a straight line on a log–linear plot of the error. When the convergence is quadratic, no such straight line exists—the convergence keeps getting steeper. Alternatively, note that (neglecting high-order terms)

\[ \log(|\epsilon_{k+1}|) \approx 2 \log(|\epsilon_{k}|) + \text{constant},\]

which is equivalent to saying that the number of accurate digits approximately doubles at each iteration, once the errors become small enough.

Let’s revisit the assumptions made to derive quadratic convergence as given by (84):

  1. The residual function \(f\) has to have enough continuous derivatives to make the Taylor series expansion valid. Often this is stated as \(f\) being “smooth enough.” This is usually not a problem, but see this exercise.

  2. We required \(f'(r)\neq 0\)—that is, \(r\) must be a simple root. See this exercise to investigate what happens at a multiple root.

  3. We assumed that the sequence converged, which is not easy to guarantee in any particular case. In fact, finding a starting guess from which the Newton iteration converges is often the most challenging part of a rootfinding problem. We will try to deal with this issue in \secref{quasinewton}.

Implementation

Function 35 (newton)

Newton’s method for a scalar rootfinding problem.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
"""
newton(f,dfdx,x1)

Use Newton's method to find a root of `f` starting from `x1`, where
`dfdx` is the derivative of `f`. Returns a vector of root estimates.
"""
function newton(f,dfdx,x1)
    # Operating parameters.
    funtol = 100*eps();  xtol = 100*eps();  maxiter = 40;

    x = [x1]
    y = f(x1)
    dx = Inf   # for initial pass below
    k = 1

    while (abs(dx) > xtol) && (abs(y) > funtol) && (k < maxiter)
        dydx = dfdx(x[k])
        dx = -y/dydx            # Newton step
        push!(x,x[k]+dx)        # append new estimate

        k = k+1
        y = f(x[k])
    end

    if k==maxiter
        @warn "Maximum number of iterations reached."
    end

    return x
end

Our implementation of Newton’s iteration is given in newton. It accepts mathematical functions \(f\) and \(f'\) and the starting guess \(x_1\) as input arguments. Beginning programmers are tempted to embed the functions directly into the code, but there are two good reasons not to do so. First, you would need a new copy of the whole code for each new instance of the problem, even though very little code may need to change. Second, you may want to try more than one rootfinding implementation for a particular problem, and keeping the definition of the problem separate from the algorithm for its solution makes this task much easier. As a practical issue in MATLAB, you can define short functions inline as in Convergence of Newton’s method. For longer functions, you can write separate function files and pass them as arguments to newton by affixing an at-sign \verb+@+ to the front of the name of the function file.

The function newton also deals with a thorny practical issue: how to stop the iteration. It adopts a three-part criterion. First, it monitors the difference between successive root estimates, \(|x_k-x_{k-1}|\), which is used as a stand-in for the unknown error \(|r-x_k|\). In addition, it monitors the residual \(|f(x_k)|\), which is equivalent to the backward error and more realistic to control in badly conditioned problems (see The rootfinding problem). If either of these quantities is considered to be sufficiently small, the iteration ends. Finally, we need to protect against the possibility of a nonconvergent iteration, so the procedure terminates with a warning if a maximum number of iterations is exceeded.1

Exercises

For each of problems 1–3, do the following steps.

(a) ✍ Rewrite the equation into the standard form for rootfinding, \(f(x) = 0\), and compute \(f'(x)\). (b) ⌨ Make a plot of \(f\) over the given interval and determine how many roots lie in the interval. (c) ⌨ Use nlsolve to find an “exact” value for each root. (d) ⌨ Use newton to find each root. (e) ⌨ For one of the roots, define e as a vector of the errors in the Newton sequence. Determine numerically whether the convergence is roughly quadratic.

  1. \(x^2=e^{-x}\), over \([-2,2]\)

  2. \(2x = \tan x\), over \([-0.2,1.4]\)

  3. \(e^{x+1}=2+x\), over \([-2,2]\)

  4. ⌨ Consider the equation \(f(x)=x^{-2} - \sin x=0\) on the interval \(x \in [0.1,4\pi]\). Use a plot to approximately locate the roots of \(f\). To which roots do the following initial guesses converge when using newton? Is the root obtained the one that is closest to that guess?

    (a) \(x_0 = 1.5,\quad\) (b) \(x_0 = 2,\quad\) (c) \(x_0 = 3.2,\quad\) (d) \(x_0 = 4,\quad\) (e) \(x_0 = 5,\quad\) (f) \(x_0 = 2\pi\).

  5. ✍ Show that if \(f(x)=x^{-1}-b\) for nonzero \(b\), then Newton’s iteration converging to the root \(r=1/b\) can be implemented without performing any divisions.

  6. ✍ Discuss what happens when Newton’s method is applied to find a root of \(f(x) = \operatorname{sign}(x) \sqrt{|x|}\), starting at \(x_0\ne 0\).

  7. ✍ In the case of a multiple root, where \(f(r)=f'(r)=0\), the derivation of the quadratic error convergence in (84) is invalid. Redo the derivation to show that in this circumstance and with \(f''(r)\neq 0\), the error converges only linearly.

  8. ✍ In newton and elsewhere, the actual error is not available, so we use \(|x_k-x_{k-1}|\) as an approximate indicator of error to determine when to stop the iteration. Find an example that foils this indicator; that is, a sequence \(\{x_k\}\) such that

    \[\lim_{k\rightarrow \infty} (x_k-x_{k-1}) = 0,\]

    but \(\{x_k\}\) diverges. (Hint: You have seen such sequences in calculus.) Hence the need for residual tolerances and escape hatches in the code!


1

In more practical codes, the thresholds used to make these decisions are controllable through additional user inputs to the procedure.