Newton’s method is the cornerstone of rootfinding. We introduce the key idea with an example in Demo 4.3.1 .
Example 4.3.1 (Graphical interpretation of Newton’s method)
Example 4.3.1 Suppose we want to find a root of the function
f(x) = x * exp(x) - 2
using Plots
plot(f, 0, 1.5;
label="function", legend=:topleft,
grid=:y, ylim=[-2, 4], xlabel=L"x", ylabel=L"y")
From the graph, it is clear that there is a root near x = 1 x=1 x = 1 . So we call that our initial guess, x 1 x_1 x 1 .
x₁ = 1
y₁ = f(x₁)
scatter!([x₁], [y₁], label="initial point")
Next, we can compute the tangent line at the point ( x 1 , f ( x 1 ) ) \bigl(x_1,f(x_1)\bigr) ( x 1 , f ( x 1 ) ) , using the derivative.
df_dx(x) = exp(x) * (x + 1)
m₁ = df_dx(x₁)
tangent = x -> y₁ + m₁ * (x - x₁)
plot!(tangent, 0, 1.5, l=:dash, label="tangent line",
title="Tangent line approximation")
In lieu of finding the root of f f f itself, we settle for finding the root of the tangent line approximation, which is trivial. Call this x 2 x_2 x 2 , our next approximation to the root.
@show x₂ = x₁ - y₁ / m₁
scatter!([x₂], [0], label="tangent root", title="First iteration")
The residual (i.e., value of f f f ) is smaller than before, but not zero. So we repeat the process with a new tangent line based on the latest point on the curve.
plot(f, 0.82, 0.87;
label="function", legend=:topleft,
xlabel=L"x", ylabel=L"y",
title="Second iteration")
scatter!([x₂], [y₂], label="starting point")
m₂ = df_dx(x₂)
tangent = x -> y₂ + m₂ * (x - x₂)
plot!(tangent, 0.82, 0.87; l=:dash, label="tangent line")
@show x₃ = x₂ - y₂ / m₂
scatter!([x₃], [0], label="tangent root")
Judging by the residual, we appear to be getting closer to the true root each time.
Example 4.3.1 Suppose we want to find a root of the function
f = @(x) x .* exp(x) - 2;
clf, fplot(f, [0, 1.5])
xlabel('x'), ylabel('y')
set(gca, 'ygrid', 'on')
title('Objective function')
From the graph, it is clear that there is a root near x = 1 x=1 x = 1 . So we call that our initial guess, x 1 x_1 x 1 .
x1 = 1;
y1 = f(x1)
hold on, scatter(x1, y1, 'k')
Next, we can compute the tangent line at the point ( x 1 , f ( x 1 ) ) \bigl(x_1,f(x_1)\bigr) ( x 1 , f ( x 1 ) ) , using the derivative.
df_dx = @(x) exp(x) .* (x + 1);
slope1 = df_dx(x1);
tangent1 = @(x) y1 + slope1 * (x - x1);
axis(axis)
fplot(tangent1, [0, 1.5], 'k--')
title('Function and tangent line')
In lieu of finding the root of f f f itself, we settle for finding the root of the tangent line approximation, which is trivial. Call this x 2 x_2 x 2 , our next approximation to the root.
x2 = x1 - y1 / slope1
scatter(x2, 0, 'r')
title('Root of the tangent')
The residual (i.e., value of f f f ) is smaller than before, but not zero. So we repeat the process with a new tangent line based on the latest point on the curve.
cla, axis auto
fplot(f, [0.83, 0.88])
scatter(x2, y2, 'k')
slope2 = df_dx(x2);
tangent2 = @(x) y2 + slope2 * (x - x2);
axis(axis)
fplot(tangent2, [0.8, 0.9], 'k--')
x3 = x2 - y2 / slope2;
scatter(x3, 0, 'r')
title('Next iteration')
Judging by the residual, we appear to be getting closer to the true root each time.
Example 4.3.1 Suppose we want to find a root of this function:
f = lambda x: x * exp(x) - 2
xx = linspace(0, 1.5, 400)
fig, ax = subplots()
ax.plot(xx, f(xx), label="function")
ax.grid()
ax.set_xlabel("$x$")
ax.set_ylabel("$y$");
From the graph, it is clear that there is a root near x = 1 x=1 x = 1 . So we call that our initial guess, x 1 x_1 x 1 .
x1 = 1
y1 = f(x1)
ax.plot(x1, y1, "ko", label="initial point")
ax.legend()
fig
Next, we can compute the tangent line at the point ( x 1 , f ( x 1 ) ) \bigl(x_1,f(x_1)\bigr) ( x 1 , f ( x 1 ) ) , using the derivative.
df_dx = lambda x: exp(x) * (x + 1)
slope1 = df_dx(x1)
tangent1 = lambda x: y1 + slope1 * (x - x1)
ax.plot(xx, tangent1(xx), "--", label="tangent line")
ax.set_ylim(-2, 4)
ax.legend()
fig
In lieu of finding the root of f f f itself, we settle for finding the root of the tangent line approximation, which is trivial. Call this x 2 x_2 x 2 , our next approximation to the root.
x2 = x1 - y1 / slope1
ax.plot(x2, 0, "ko", label="tangent root")
ax.legend()
fig
The residual (i.e., value of f f f ) is smaller than before, but not zero. So we repeat the process with a new tangent line based on the latest point on the curve.
xx = linspace(0.83, 0.88, 200)
plot(xx, f(xx))
plot(x2, y2, "ko")
grid(), xlabel("$x$"), ylabel("$y$")
slope2 = df_dx(x2)
tangent2 = lambda x: y2 + slope2 * (x - x2)
plot(xx, tangent2(xx), "--")
x3 = x2 - y2 / slope2
plot(x3, 0, "ko")
title("Second iteration");
Judging by the residual, we appear to be getting closer to the true root each time.
Using general notation, if we have a root approximation x k x_k x k , we can construct a linear model of f ( x ) f(x) f ( x ) using the classic formula for the tangent line of a differentiable function,
q ( x ) = f ( x k ) + f ′ ( x k ) ( x − x k ) . q(x) = f(x_k) + f'(x_k)(x-x_k). q ( x ) = f ( x k ) + f ′ ( x k ) ( x − x k ) . Finding the root of q ( x ) = 0 q(x)=0 q ( x ) = 0 is trivial. We define the next approximation by the condition q ( x k + 1 ) = 0 q(x_{k+1})=0 q ( x k + 1 ) = 0 , which leads to the following.
Algorithm 4.3.1 (Newton’s method)
Given a function f f f , its derivative, f ′ f' f ′ , and an initial value x 1 x_1 x 1 , iteratively define
x k + 1 = x k − f ( x k ) f ′ ( x k ) , k = 1 , 2 , … . x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}, \qquad k=1,2,\ldots. x k + 1 = x k − f ′ ( x k ) f ( x k ) , k = 1 , 2 , … . 4.3.1 Convergence ¶ The graphs of Demo 4.3.1 suggest why the Newton iteration may converge to a root: any differentiable function looks more and more like its tangent line as we zoom in to the point of tangency. Yet it is far from clear that it must converge, or at what rate it will do so. The matter of the convergence rate is fairly straightforward to resolve. Define the error sequence
ϵ k = x k − r , k = 1 , 2 , … , \epsilon_k = x_k - r , \quad k=1,2,\ldots, ϵ k = x k − r , k = 1 , 2 , … , where r r r is the limit of the sequence and f ( r ) = 0 f(r)=0 f ( r ) = 0 . Exchanging x x x -values for ε-values in (4.3.2) gives
ϵ k + 1 + r = ϵ k + r − f ( r + ϵ k ) f ′ ( r + ϵ k ) . \epsilon_{k+1}+r = \epsilon_k + r - \frac{f(r+\epsilon_k)}{f'(r+\epsilon_k)}. ϵ k + 1 + r = ϵ k + r − f ′ ( r + ϵ k ) f ( r + ϵ k ) . We assume that ∣ ϵ k ∣ → 0 |\epsilon_k|\to 0 ∣ ϵ k ∣ → 0 ; eventually, the errors remain as small as we please forever. Then a Taylor expansion of f f f about x = r x=r x = r gives
ϵ k + 1 = ϵ k − f ( r ) + ϵ k f ′ ( r ) + 1 2 ϵ k 2 f ′ ′ ( r ) + O ( ϵ k 3 ) f ′ ( r ) + ϵ k f ′ ′ ( r ) + O ( ϵ k 2 ) . \epsilon_{k+1} = \epsilon_k - \frac{ f(r) + \epsilon_kf'(r) + \frac{1}{2}\epsilon_k^2f''(r) +
O(\epsilon_k^3)}{ f'(r) + \epsilon_kf''(r) + O(\epsilon_k^2)}. ϵ k + 1 = ϵ k − f ′ ( r ) + ϵ k f ′′ ( r ) + O ( ϵ k 2 ) f ( r ) + ϵ k f ′ ( r ) + 2 1 ϵ k 2 f ′′ ( r ) + O ( ϵ k 3 ) . We use the fact that f ( r ) = 0 f(r)=0 f ( r ) = 0 and additionally assume now that r r r is a simple root, i.e., f ′ ( r ) ≠ 0 f'(r)\neq 0 f ′ ( r ) = 0 . Then
ϵ k + 1 = ϵ k − ϵ k [ 1 + 1 2 f ′ ′ ( r ) f ′ ( r ) ϵ k + O ( ϵ k 2 ) ] [ 1 + f ′ ′ ( r ) f ′ ( r ) ϵ k + O ( ϵ k 2 ) ] − 1 . \epsilon_{k+1} = \epsilon_k - \epsilon_k \left[ 1 + \dfrac{1}{2}\dfrac{f''(r)}{f'(r)} \epsilon_k
+ O(\epsilon_k^2)\right] \, \left[ 1 + \dfrac{f''(r)}{f'(r)}\epsilon_k + O(\epsilon_k^2)\right]^{-1}. ϵ k + 1 = ϵ k − ϵ k [ 1 + 2 1 f ′ ( r ) f ′′ ( r ) ϵ k + O ( ϵ k 2 ) ] [ 1 + f ′ ( r ) f ′′ ( r ) ϵ k + O ( ϵ k 2 ) ] − 1 . The series in the denominator is of the form 1 / ( 1 + z ) 1/(1+z) 1/ ( 1 + z ) . Provided ∣ z ∣ < 1 |z|<1 ∣ z ∣ < 1 , this is the limit of the geometric series 1 − z + z 2 − z 3 + ⋯ 1-z+z^2-z^3 + \cdots 1 − z + z 2 − z 3 + ⋯ . Keeping only the lowest-order terms, we derive
ϵ k + 1 = ϵ k − ϵ k [ 1 + 1 2 f ′ ′ ( r ) f ′ ( r ) ϵ k + O ( ϵ k 2 ) ] [ 1 − f ′ ′ ( r ) f ′ ( r ) ϵ k + O ( ϵ k 2 ) ] = 1 2 f ′ ′ ( r ) f ′ ( r ) ϵ k 2 + O ( ϵ k 3 ) . \begin{align*}
\epsilon_{k+1} &= \epsilon_k - \epsilon_k \left[ 1 + \dfrac{1}{2}\dfrac{f''(r)}{f'(r)} \epsilon_k + O(\epsilon_k^2) \right] \, \left[ 1 - \dfrac{f''(r)}{f'(r)}
\epsilon_k + O(\epsilon_k^2) \right]\\
&= \frac{1}{2}\, \frac{f''(r)}{f'(r)} \epsilon_k^2 + O(\epsilon_k^3).
\end{align*} ϵ k + 1 = ϵ k − ϵ k [ 1 + 2 1 f ′ ( r ) f ′′ ( r ) ϵ k + O ( ϵ k 2 ) ] [ 1 − f ′ ( r ) f ′′ ( r ) ϵ k + O ( ϵ k 2 ) ] = 2 1 f ′ ( r ) f ′′ ( r ) ϵ k 2 + O ( ϵ k 3 ) . Definition 4.3.1 (Quadratic convergence)
Suppose a sequence x k x_k x k approaches limit x ∗ x^* x ∗ . If the error sequence ϵ k = x k − x ∗ \epsilon_k=x_k - x^* ϵ k = x k − x ∗ satisfies
lim k → ∞ ∣ ϵ k + 1 ∣ ∣ ϵ k ∣ 2 = L \lim_{k\to\infty} \frac{|\epsilon_{k+1}|}{|\epsilon_k|^2} = L k → ∞ lim ∣ ϵ k ∣ 2 ∣ ϵ k + 1 ∣ = L for a positive constant L L L , then the sequence has quadratic convergence to the limit.
Recall that linear convergence is identifiable by trending toward a straight line on a log-linear plot of the error. When the convergence is quadratic, no such straight line exists—the convergence keeps getting steeper. As a numerical test, note that ∣ ϵ k + 1 ∣ ≈ K ∣ ϵ k ∣ 2 |\epsilon_{k+1}|\approx K |\epsilon_{k}|^2 ∣ ϵ k + 1 ∣ ≈ K ∣ ϵ k ∣ 2 implies that as k → ∞ k\to\infty k → ∞ ,
log ∣ ϵ k + 1 ∣ ≈ 2 log ∣ ϵ k ∣ + L , log ∣ ϵ k + 1 ∣ log ∣ ϵ k ∣ ≈ 2 + L log ∣ ϵ k ∣ → 2. \begin{split}
\log |\epsilon_{k+1}| & \approx 2 \log |\epsilon_{k}| + L,\\
\frac{\log |\epsilon_{k+1}|}{\log |\epsilon_{k}|} &\approx 2 + \frac{L}{\log |\epsilon_{k}|} \to 2.
\end{split} log ∣ ϵ k + 1 ∣ log ∣ ϵ k ∣ log ∣ ϵ k + 1 ∣ ≈ 2 log ∣ ϵ k ∣ + L , ≈ 2 + log ∣ ϵ k ∣ L → 2. Example 4.3.2 (Convergence of Newton’s method)
Example 4.3.2 We again look at finding a solution of x e x = 2 x e^x=2 x e x = 2 near x = 1 x=1 x = 1 . To apply Newton’s method, we need to calculate values of both the residual function f f f and its derivative.
f(x) = x * exp(x) - 2;
df_dx(x) = exp(x) * (x + 1);
We don’t know the exact root, so we use nlsolve
to determine a proxy for it.
using NLsolve
r = nlsolve(x -> f(x[1]), [1.0]).zero
1-element Vector{Float64}:
0.852605502013726
We use x 1 = 1 x_1=1 x 1 = 1 as a starting guess and apply the iteration in a loop, storing the sequence of iterates in a vector.
x = [1; zeros(4)]
for k = 1:4
x[k+1] = x[k] - f(x[k]) / df_dx(x[k])
end
x
5-element Vector{Float64}:
1.0
0.8678794411714423
0.8527833734164099
0.8526055263689221
0.852605502013726
Here is the sequence of errors.
5-element Vector{Float64}:
0.14739449798627402
0.015273939157716354
0.00017787140268388235
2.435519608212644e-8
0.0
Because the error reaches machine epsilon so rapidly, we’re going to use extended precision to allow us to take a few more iterations. We’ll take the last iteration as the most accurate root estimate.
Tip
A BigFloat
uses 256 bits of precision, rather than 53 in Float64
. But arithmetic is done by software emulation and is much slower.
x = [BigFloat(1); zeros(7)]
for k = 1:7
x[k+1] = x[k] - f(x[k]) / df_dx(x[k])
end
r = x[end]
0.8526055020137254913464724146953174668984533001514035087721073946525150656742605
ϵ = @. Float64(x[1:end-1] - r)
7-element Vector{Float64}:
0.14739449798627452
0.01527393915771683
0.00017787140268443004
2.435519656311045e-8
4.56680051680793e-16
1.6056572825272187e-31
1.9848810119594387e-62
The exponents in the scientific notation definitely suggest a squaring sequence. We can check the evolution of the ratio in (4.3.9) .
logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [1:7 ϵ logerr ratios]
The clear convergence to 2 above constitutes good evidence of quadratic convergence.
Example 4.3.2 We again look at finding a solution of x e x = 2 x e^x=2 x e x = 2 near x = 1 x=1 x = 1 . To apply Newton’s method, we need to calculate values of both the residual function f f f and its derivative.
f = @(x) x.*exp(x) - 2;
df_dx = @(x) exp(x).*(x+1);
We don’t know the exact root, so we use nlsolve
to determine a proxy for it.
format long, r = fzero(f,1)
We use x 1 = 1 x_1=1 x 1 = 1 as a starting guess and apply the iteration in a loop, storing the sequence of iterates in a vector.
x = 1;
for k = 1:6
x(k+1) = x(k) - f(x(k)) / df_dx(x(k));
end
x
Here is the sequence of errors.
format short e
err = x' - r
The exponents in the scientific notation definitely suggest a squaring sequence. We can check the evolution of the ratio in (4.3.9) .
format short
logerr = log(abs(err))
The clear convergence to 2 above constitutes good evidence of quadratic convergence.
Example 4.3.2 We again look at finding a solution of x e x = 2 x e^x=2 x e x = 2 near x = 1 x=1 x = 1 . To apply Newton’s method, we need to calculate values of both the residual function f f f and its derivative.
f = lambda x: x * exp(x) - 2
df_dx = lambda x: exp(x) * (x + 1)
We don’t know the exact root, so we use nlsolve
to determine a proxy for it.
r = root_scalar(f, bracket=[0.8, 1.0]).root
print(r)
We use x 1 = 1 x_1=1 x 1 = 1 as a starting guess and apply the iteration in a loop, storing the sequence of iterates in a vector.
x = ones(5)
for k in range(4):
x[k + 1] = x[k] - f(x[k]) / df_dx(x[k])
print(x)
[1. 0.86787944 0.85278337 0.85260553 0.8526055 ]
Here is the sequence of errors.
[1.47394498e-01 1.52739392e-02 1.77871403e-04 2.43551965e-08
4.44089210e-16]
The exponents in the scientific notation definitely suggest a squaring sequence. We can check the evolution of the ratio in (4.3.9) .
logerr = log(abs(err))
for i in range(len(err) - 1):
print(logerr[i+1] / logerr[i])
2.1840144823399648
2.064863881067786
2.030299689916648
2.01651205997716
The clear convergence to 2 above constitutes good evidence of quadratic convergence.
Let’s summarize the assumptions made to derive quadratic convergence as given by (4.3.7) :
The residual function f f f has to have enough continuous derivatives to make the Taylor series expansion valid. Often this is stated as f f f having sufficient smoothness . This is usually not a problem, but see Exercise 6 . We required f ′ ( r ) ≠ 0 f'(r)\neq 0 f ′ ( r ) = 0 , meaning that r r r must be a simple root. See Exercise 7 to investigate what happens at a multiple root. We assumed that the sequence converged, which is not easy to guarantee in any particular case. In fact,
finding a starting value from which the Newton iteration converges is often the most challenging part of a rootfinding problem. We will try to deal with this issue in Quasi-Newton methods . 4.3.2 Implementation ¶ Our implementation of Newton’s iteration is given in Function 4.3.2 . It accepts functions that evaluate f f f and f ′ f' f ′ and the starting value x 1 x_1 x 1 as input arguments. Beginning programmers are tempted to embed f f f and f ′ f' f ′ directly into the code, but there are two good reasons not to do so. First, each new rootfinding problem would require its own copy of the code, creating a lot of duplication. Second, you may want to try more than one rootfinding algorithm for a particular problem, and keeping the definition of the problem separate from the algorithm for its solution makes this task much easier.
Newton’s method1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
"""
newton(f, df_dx, x₁ [;maxiter,f tol, xtol])
Use Newton's method to find a root of `f` starting from `x₁`, where
`df_dx` is the derivative of `f`. Returns a vector of root estimates.
The optional keyword parameters set the maximum number of iterations
and the stopping tolerance for values of `f` and changes in `x`.
"""
function newton(f, df_dx, x₁; maxiter = 40, ftol = 1e-13, xtol = 1e-13)
x = [float(x₁)]
y = f(x₁)
Δx = Inf # for initial pass below
k = 1
while (abs(Δx) > xtol) && (abs(y) > ftol)
dy_dx = df_dx(x[k])
Δx = -y / dy_dx # Newton step
push!(x, x[k] + Δx) # append new estimate
k += 1
y = f(x[k])
if k == maxiter
@warn "Maximum number of iterations reached."
break # exit loop
end
end
return x
end
About the code
Function 4.3.2 accepts keyword arguments . In the function declaration, these follow the semicolon, and when the function is called, they may be supplied as keyword=value
in the argument list. Here, these arguments are also given default values by the assignments within the declaration. This arrangement is useful when there are multiple optional arguments, because the ordering of them doesn’t matter.
The break
statement, seen here in line 25, causes an immediate exit from the innermost loop in which it is called. It is often used as a safety valve to escape an iteration that may not be able to terminate otherwise.
Newton’s method1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function x = newton(f,dfdx,x1)
% NEWTON Newton's method for a scalar equation.
% Input:
% f objective function
% dfdx derivative function
% x1 initial root approximation
% Output
% x vector of root approximations (last one is best)
% Operating parameters.
funtol = 100*eps; xtol = 100*eps; maxiter = 40;
x = x1;
y = f(x1);
dx = Inf; % for initial pass below
k = 1;
while (abs(dx) > xtol) && (abs(y) > funtol) && (k < maxiter)
dydx = dfdx(x(k));
dx = -y/dydx; % Newton step
x(k+1) = x(k) + dx;
k = k+1;
y = f(x(k));
end
if k==maxiter
warning('Maximum number of iterations reached.')
end
Newton’s method1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def newton(f, dfdx, x1):
"""
newton(f, dfdx, x1)
Use Newton's method to find a root of f starting from x1, where dfdx is the
derivative of f. Returns a vector of root estimates.
"""
# Operating parameters.
eps = np.finfo(float).eps
funtol = 100 * eps
xtol = 100 * eps
maxiter = 40
x = np.zeros(maxiter)
x[0] = x1
y = f(x1)
dx = np.inf # for initial pass below
k = 0
while (abs(dx) > xtol) and (abs(y) > funtol) and (k < maxiter):
dydx = dfdx(x[k])
dx = -y / dydx # Newton step
x[k + 1] = x[k] + dx # new estimate
k = k + 1
y = f(x[k])
if k == maxiter:
warnings.warn("Maximum number of iterations reached.")
return x[:k+1]
About the code
Function 4.3.2 accepts keyword arguments . In the function declaration, these follow the semicolon, and when the function is called, they may be supplied as keyword=value
in the argument list. Here, these arguments are also given default values by the assignments within the declaration. This arrangement is useful when there are multiple optional arguments, because the ordering of them doesn’t matter.
The break
statement, seen here in line 25, causes an immediate exit from the innermost loop in which it is called. It is often used as a safety valve to escape an iteration that may not be able to terminate otherwise.
Function 4.3.2 also deals with a thorny practical issue: how to stop the iteration. It adopts a three-part criterion. First, it monitors the difference between successive root estimates, ∣ x k − x k − 1 ∣ |x_k-x_{k-1}| ∣ x k − x k − 1 ∣ , which is used as a stand-in for the unknown error ∣ x k − r ∣ |x_k-r| ∣ x k − r ∣ . In addition, it monitors the residual ∣ f ( x k ) ∣ |f(x_k)| ∣ f ( x k ) ∣ , which is equivalent to the backward error and more realistic to control in badly conditioned problems (see The rootfinding problem ). If either of these quantities is considered to be sufficiently small, the iteration ends. Finally, we need to protect against the possibility of a nonconvergent iteration, so the procedure terminates with a warning if a maximum number of iterations is exceeded.
Example 4.3.3 (Using Newton’s method)
Example 4.3.3 Suppose we want to evaluate the inverse of the function h ( x ) = e x − x h(x)=e^x-x h ( x ) = e x − x . This means solving y = e x − x y=e^x-x y = e x − x for x x x when y y y is given, which has no elementary form. If a value of y y y is given numerically, though, we simply have a rootfinding problem for f ( x ) = e x − x − y f(x)=e^x-x-y f ( x ) = e x − x − y .
Tip
The enumerate
function produces a pair of values for each iteration: a positional index and the corresponding contents.
g(x) = exp(x) - x
dg_dx(x) = exp(x) - 1
y = range(g(0), g(2), 200)
x = zeros(length(y))
for (i, y) in enumerate(y)
f(x) = g(x) - y
df_dx(x) = dg_dx(x)
r = FNC.newton(f, df_dx, y)
x[i] = r[end]
end
plot(g, 0, 2, aspect_ratio=1, label=L"g(x)")
plot!(y, x, label=L"g^{-1}(y)", title="Function and its inverse")
plot!(x -> x, 0, maximum(y), label="", l=(:dash, 1), color=:black)
Example 4.3.3 Suppose we want to evaluate the inverse of the function h ( x ) = e x − x h(x)=e^x-x h ( x ) = e x − x . This means solving y = e x − x y=e^x-x y = e x − x for x x x when y y y is given, which has no elementary form. If a value of y y y is given numerically, though, we simply have a rootfinding problem for f ( x ) = e x − x − y f(x)=e^x-x-y f ( x ) = e x − x − y .
Tip
When a function is created, it can refer to any variables in scope at that moment. Those values are locked in to the definition, which is called a closure . If the enclosed variables change values later, the function still uses the values it was created with.
h = @(x) exp(x) - x;
dh_dx = @(x) exp(x) - 1;
y_ = linspace(h(0), h(2), 200);
x_ = zeros(size(y_));
for i = 1:length(y_)
f = @(x) h(x) - y_(i);
df_dx = @(x) dh_dx(x);
x = newton(f, df_dx, 1); x_(i) = x(end);
end
clf, fplot(h, [0, 2])
hold on, axis equal
plot(y_, x_)
plot([0, max(y_)], [0, max(y_)], 'k--')
xlabel('x'), ylabel('y')
legend('h(x)', 'inverse', 'y=x');
Example 4.3.3 Suppose we want to evaluate the inverse of the function h ( x ) = e x − x h(x)=e^x-x h ( x ) = e x − x . This means solving y = h ( x ) y=h(x) y = h ( x ) , or h ( x ) − y = 0 h(x)-y=0 h ( x ) − y = 0 , for x x x when y y y is given. That equation has no solution in terms of elementary functions. If a value of y y y is given numerically, though, we simply have a rootfinding problem for f ( x ) = e x − x − y f(x)=e^x-x-y f ( x ) = e x − x − y .
Tip
The enumerate
function produces a pair of values for each iteration: a positional index and the corresponding contents.
h = lambda x: exp(x) - x
dh_dx = lambda x: exp(x) - 1
y_ = linspace(h(0), h(2), 200)
x_ = zeros(y_.shape)
for (i, y) in enumerate(y_):
f = lambda x: h(x) - y
df_dx = lambda x: dh_dx(x)
x = FNC.newton(f, df_dx, y)
x_[i] = x[-1]
plot(x_, y_, label="$y=h(x)$")
plot(y_, x_, label="$y=h^{-1}(x)$")
plot([0, max(y_)], [0, max(y_)], 'k--', label="")
title("Function and its inverse")
xlabel("x"), ylabel("y"), axis("equal")
ax.grid()
legend();
4.3.3 Exercises ¶ For each of Exercises 1–3, do the following steps.
(a) ✍ Rewrite the equation into the standard form for rootfinding, f ( x ) = 0 f(x) = 0 f ( x ) = 0 , and compute f ′ ( x ) f'(x) f ′ ( x ) .
(b) ⌨ Make a plot of f f f over the given interval and determine how many roots lie in the interval.
(c) ⌨ Use nlsolve
with ftol=1e-15
to find a reference value for each root.
(d) ⌨ Use Function 4.3.2 to find each root.
(e) ⌨ For one of the roots, use the errors in the Newton sequence to determine numerically whether the convergence is roughly quadratic.
x 2 = e − x x^2=e^{-x} x 2 = e − x , over [ − 2 , 2 ] [-2,2] [ − 2 , 2 ]
2 x = tan x 2x = \tan x 2 x = tan x , over [ − 0.2 , 1.4 ] [-0.2,1.4] [ − 0.2 , 1.4 ]
e x + 1 = 2 + x e^{x+1}=2+x e x + 1 = 2 + x , over [ − 2 , 2 ] [-2,2] [ − 2 , 2 ]
⌨ Plot the function f ( x ) = x − 2 − sin x f(x)=x^{-2} - \sin x f ( x ) = x − 2 − sin x on the interval x ∈ [ 0.5 , 10 ] x \in [0.5,10] x ∈ [ 0.5 , 10 ] . For each initial value x 1 = 1 , x 1 = 2 , … , x 1 = 7 x_1=1,\, x_1=2,\,\ldots,\, x_1=7 x 1 = 1 , x 1 = 2 , … , x 1 = 7 , apply Function 4.3.2 to f f f , and make a table showing x 1 x_1 x 1 and the resulting root found by the method. In which case does the iteration converge to a root other than the one closest to it? Use the plot to explain why that happened.
✍ Show that if f ( x ) = x − 1 − b f(x)=x^{-1}-b f ( x ) = x − 1 − b for nonzero b b b , then Newton’s iteration converging to the root r = 1 / b r=1/b r = 1/ b can be implemented without performing any divisions.
✍ Discuss what happens when Newton’s method is applied to find a root of f ( x ) = sign ( x ) ∣ x ∣ f(x) = \operatorname{sign}(x) \sqrt{|x|} f ( x ) = sign ( x ) ∣ x ∣ , starting at x 1 ≠ 0 x_1\ne 0 x 1 = 0 . (Hint: Write out both f ( x ) f(x) f ( x ) and f ′ ( x ) f'(x) f ′ ( x ) as piecewise functions.) ✍ In the case of a multiple root, where f ( r ) = f ′ ( r ) = 0 f(r)=f'(r)=0 f ( r ) = f ′ ( r ) = 0 , the derivation of the quadratic error convergence in (4.3.7) is invalid. Redo the derivation to show that in this circumstance and with f ′ ′ ( r ) ≠ 0 f''(r)\neq 0 f ′′ ( r ) = 0 , the error converges only linearly.
✍ In Function 4.3.2 and elsewhere, the actual error is not available, so we use ∣ x k − x k − 1 ∣ |x_k-x_{k-1}| ∣ x k − x k − 1 ∣ as an approximate indicator of error to determine when to stop the iteration. Find an example that foils this indicator; that is, a sequence { x k } \{x_k\} { x k } such that
lim k → ∞ ( x k − x k − 1 ) = 0 , \lim_{k\rightarrow \infty} (x_k-x_{k-1}) = 0, k → ∞ lim ( x k − x k − 1 ) = 0 , but { x k } \{x_k\} { x k } diverges. (Hint: You have seen such sequences in calculus.) Hence the need for residual tolerances and safety valves in the code!