Skip to article frontmatterSkip to article content

Newton’s method

Newton’s method is the cornerstone of rootfinding. We introduce the key idea with an example in Demo 4.3.1.

Using general notation, if we have a root approximation xkx_k, we can construct a linear model of f(x)f(x) using the classic formula for the tangent line of a differentiable function,

q(x)=f(xk)+f(xk)(xxk). q(x) = f(x_k) + f'(x_k)(x-x_k).

Finding the root of q(x)=0q(x)=0 is trivial. We define the next approximation by the condition q(xk+1)=0q(x_{k+1})=0, which leads to the following.

4.3.1Convergence

The graphs of Demo 4.3.1 suggest why the Newton iteration may converge to a root: any differentiable function looks more and more like its tangent line as we zoom in to the point of tangency. Yet it is far from clear that it must converge, or at what rate it will do so. The matter of the convergence rate is fairly straightforward to resolve. Define the error sequence

ϵk=xkr,k=1,2,,\epsilon_k = x_k - r , \quad k=1,2,\ldots,

where rr is the limit of the sequence and f(r)=0f(r)=0. Exchanging xx-values for ε-values in (4.3.2) gives

ϵk+1+r=ϵk+rf(r+ϵk)f(r+ϵk). \epsilon_{k+1}+r = \epsilon_k + r - \frac{f(r+\epsilon_k)}{f'(r+\epsilon_k)}.

We assume that ϵk0|\epsilon_k|\to 0; eventually, the errors remain as small as we please forever. Then a Taylor expansion of ff about x=rx=r gives

ϵk+1=ϵkf(r)+ϵkf(r)+12ϵk2f(r)+O(ϵk3)f(r)+ϵkf(r)+O(ϵk2). \epsilon_{k+1} = \epsilon_k - \frac{ f(r) + \epsilon_kf'(r) + \frac{1}{2}\epsilon_k^2f''(r) + O(\epsilon_k^3)}{ f'(r) + \epsilon_kf''(r) + O(\epsilon_k^2)}.

We use the fact that f(r)=0f(r)=0 and additionally assume now that rr is a simple root, i.e., f(r)0f'(r)\neq 0. Then

ϵk+1=ϵkϵk[1+12f(r)f(r)ϵk+O(ϵk2)][1+f(r)f(r)ϵk+O(ϵk2)]1.\epsilon_{k+1} = \epsilon_k - \epsilon_k \left[ 1 + \dfrac{1}{2}\dfrac{f''(r)}{f'(r)} \epsilon_k + O(\epsilon_k^2)\right] \, \left[ 1 + \dfrac{f''(r)}{f'(r)}\epsilon_k + O(\epsilon_k^2)\right]^{-1}.

The series in the denominator is of the form 1/(1+z)1/(1+z). Provided z<1|z|<1, this is the limit of the geometric series 1z+z2z3+1-z+z^2-z^3 + \cdots. Keeping only the lowest-order terms, we derive

ϵk+1=ϵkϵk[1+12f(r)f(r)ϵk+O(ϵk2)][1f(r)f(r)ϵk+O(ϵk2)]=12f(r)f(r)ϵk2+O(ϵk3).\begin{split} \epsilon_{k+1} &= \epsilon_k - \epsilon_k \left[ 1 + \dfrac{1}{2}\dfrac{f''(r)}{f'(r)} \epsilon_k + O(\epsilon_k^2) \right] \, \left[ 1 - \dfrac{f''(r)}{f'(r)} \epsilon_k + O(\epsilon_k^2) \right]\\ &= \frac{1}{2}\, \frac{f''(r)}{f'(r)} \epsilon_k^2 + O(\epsilon_k^3). \end{split}

Recall that linear convergence is identifiable by trending toward a straight line on a log-linear plot of the error. When the convergence is quadratic, no such straight line exists—the convergence keeps getting steeper. As a numerical test, note that ϵk+1Kϵk2|\epsilon_{k+1}|\approx K |\epsilon_{k}|^2 implies that as kk\to\infty,

logϵk+12logϵk+L,logϵk+1logϵk2+Llogϵk2.\begin{align*} \log |\epsilon_{k+1}| & \approx 2 \log |\epsilon_{k}| + L,\\ \frac{\log |\epsilon_{k+1}|}{\log |\epsilon_{k}|} &\approx 2 + \frac{L}{\log |\epsilon_{k}|} \to 2. \end{align*}

Let’s summarize the assumptions made to derive quadratic convergence as given by (4.3.7):

  1. The residual function ff has to have enough continuous derivatives to make the Taylor series expansion valid. Often this is stated as ff having sufficient smoothness. This is usually not a problem, but see Exercise 6.
  2. We required f(r)0f'(r)\neq 0, meaning that rr must be a simple root. See Exercise 7 to investigate what happens at a multiple root.
  3. We assumed that the sequence converged, which is not easy to guarantee in any particular case. In fact, finding a starting value from which the Newton iteration converges is often the most challenging part of a rootfinding problem. We will try to deal with this issue in Quasi-Newton methods.

4.3.2Implementation

Our implementation of Newton’s iteration is given in Function 4.3.2. It accepts functions that evaluate ff and ff' and the starting value x1x_1 as input arguments. Beginning programmers are tempted to embed ff and ff' directly into the code, but there are two good reasons not to do so. First, each new rootfinding problem would require its own copy of the code, creating a lot of duplication. Second, you may want to try more than one rootfinding algorithm for a particular problem, and keeping the definition of the problem separate from the algorithm for its solution makes this task much easier.

Function 4.3.2 also deals with a thorny practical issue: how to stop the iteration. It adopts a three-part criterion. First, it monitors the difference between successive root estimates, xkxk1|x_k-x_{k-1}|, which is used as a stand-in for the unknown error xkr|x_k-r|. In addition, it monitors the residual f(xk)|f(x_k)|, which is equivalent to the backward error and more realistic to control in badly conditioned problems (see The rootfinding problem). If either of these quantities is considered to be sufficiently small, the iteration ends. Finally, we need to protect against the possibility of a nonconvergent iteration, so the procedure terminates with a warning if a maximum number of iterations is exceeded.

4.3.3Exercises

For each of Exercises 1–3, do the following steps.

(a) ✍ Rewrite the equation into the standard form for rootfinding, f(x)=0f(x) = 0, and compute f(x)f'(x).

(b) ⌨ Make a plot of ff over the given interval and determine how many roots lie in the interval.

(c) ⌨ Use nlsolve with ftol=1e-15 to find a reference value for each root.

(d) ⌨ Use Function 4.3.2 to find each root.

(e) ⌨ For one of the roots, use the errors in the Newton sequence to determine numerically whether the convergence is roughly quadratic.

  1. x2=exx^2=e^{-x}, over [2,2][-2,2]

  2. 2x=tanx2x = \tan x, over [0.2,1.4][-0.2,1.4]

  3. ex+1=2+xe^{x+1}=2+x, over [2,2][-2,2]


  4. ⌨ Plot the function f(x)=x2sinxf(x)=x^{-2} - \sin x on the interval x[0.5,10]x \in [0.5,10]. For each initial value x1=1,x1=2,,x1=7x_1=1,\, x_1=2,\,\ldots,\, x_1=7, apply Function 4.3.2 to ff, and make a table showing x1x_1 and the resulting root found by the method. In which case does the iteration converge to a root other than the one closest to it? Use the plot to explain why that happened.

  5. ✍ Show that if f(x)=x1bf(x)=x^{-1}-b for nonzero bb, then Newton’s iteration converging to the root r=1/br=1/b can be implemented without performing any divisions.

  6. ✍ Discuss what happens when Newton’s method is applied to find a root of f(x)=sign(x)xf(x) = \operatorname{sign}(x) \sqrt{|x|}, starting at x10x_1\ne 0. (Hint: Write out both f(x)f(x) and f(x)f'(x) as piecewise functions.)

  7. ✍ In the case of a multiple root, where f(r)=f(r)=0f(r)=f'(r)=0, the derivation of the quadratic error convergence in (4.3.7) is invalid. Redo the derivation to show that in this circumstance and with f(r)0f''(r)\neq 0, the error converges only linearly.

  8. ✍ In Function 4.3.2 and elsewhere, the actual error is not available, so we use xkxk1|x_k-x_{k-1}| as an approximate indicator of error to determine when to stop the iteration. Find an example that foils this indicator; that is, a sequence {xk}\{x_k\} such that

    limk(xkxk1)=0,\lim_{k\rightarrow \infty} (x_k-x_{k-1}) = 0,

    but {xk}\{x_k\} diverges. (Hint: You have seen such sequences in calculus.) Hence the need for residual tolerances and safety valves in the code!