Functions¶
Newton’s method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
""" newton(f, df_dx, x₁ [;maxiter,f tol, xtol]) Use Newton's method to find a root of `f` starting from `x₁`, where `df_dx` is the derivative of `f`. Returns a vector of root estimates. The optional keyword parameters set the maximum number of iterations and the stopping tolerance for values of `f` and changes in `x`. """ function newton(f, df_dx, x₁; maxiter = 40, ftol = 1e-13, xtol = 1e-13) x = [float(x₁)] y = f(x₁) Δx = Inf # for initial pass below k = 1 while (abs(Δx) > xtol) && (abs(y) > ftol) dy_dx = df_dx(x[k]) Δx = -y / dy_dx # Newton step push!(x, x[k] + Δx) # append new estimate k += 1 y = f(x[k]) if k == maxiter @warn "Maximum number of iterations reached." break # exit loop end end return x end
About the code
Function 4.3.2 accepts keyword arguments. In the function declaration, these follow the semicolon, and when the function is called, they may be supplied as keyword=value
in the argument list. Here, these arguments are also given default values by the assignments within the declaration. This arrangement is useful when there are multiple optional arguments, because the ordering of them doesn’t matter.
The break
statement, seen here in line 25, causes an immediate exit from the innermost loop in which it is called. It is often used as a safety valve to escape an iteration that may not be able to terminate otherwise.
Secant method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
""" secant(f, x₁, x₂ [;maxiter, ftol, xtol]) Use the secant method to find a root of `f` starting from `x₁` and `x₂`. Returns a vector of root estimates. The optional keyword parameters set the maximum number of iterations and the stopping tolerance for values of `f` and changes in `x`. """ function secant(f, x₁, x₂; maxiter = 40, ftol = 1e-13, xtol = 1e-13) x = [float(x₁), float(x₂)] y₁ = f(x₁) Δx, y₂ = Inf, Inf # for initial pass in the loop below k = 2 while (abs(Δx) > xtol) && (abs(y₂) > ftol) y₂ = f(x[k]) Δx = -y₂ * (x[k] - x[k-1]) / (y₂ - y₁) # secant step push!(x, x[k] + Δx) # append new estimate k += 1 y₁ = y₂ # current f-value becomes the old one next time if k == maxiter @warn "Maximum number of iterations reached." break # exit loop end end return x end
About the code
Because we want to observe the convergence of the method, Function 4.4.2 stores and returns the entire sequence of root estimates. However, only the most recent two are needed by the iterative formula. This is demonstrated by the use of y₁
and y₂
for the two most recent values of .
Newton’s method for systems
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
""" newtonsys(f, jac, x₁ [;maxiter, ftol, xtol]) Use Newton's method to find a root of a system of equations, starting from `x₁`. The functions `f` and `jac` should return the residual vector and the Jacobian matrix, respectively. Returns the history of root estimates as a vector of vectors. The optional keyword parameters set the maximum number of iterations and the stopping tolerance for values of `f` and changes in `x`. """ function newtonsys(f, jac, x₁; maxiter = 40, ftol = 1e-13, xtol = 1e-13) x = [float(x₁)] y, J = f(x₁), jac(x₁) Δx = Inf # for initial pass below k = 1 while (norm(Δx) > xtol) && (norm(y) > ftol) Δx = -(J \ y) # Newton step push!(x, x[k] + Δx) # append to history k += 1 y, J = f(x[k]), jac(x[k]) if k == maxiter @warn "Maximum number of iterations reached." break end end return x end
About the code
The output of Function 4.5.2 is a vector of vectors representing the entire history of root estimates. Since these should be in floating point, the starting value is converted with float
before the iteration starts.
Finite differences for Jacobian
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
""" fdjac(f, x₀ [,y₀]) Compute a finite-difference approximation of the Jacobian matrix for `f` at `x₀`, where `y₀`=`f(x₀)` may be given. """ function fdjac(f, x₀, y₀ = f(x₀)) δ = sqrt(eps()) * max(norm(x₀), 1) # near-optimum FD step size m, n = length(y₀), length(x₀) if n == 1 # Vector result for univariate functions. J = (f(x₀ + δ) - y₀) / δ else J = zeros(m, n) x = copy(x₀) for j in 1:n # Difference in the jth direction. x[j] += δ J[:, j] = (f(x) - y₀) / δ x[j] -= δ end end return J end
About the code
Function 4.6.1 is written to accept the case where maps variables to values with , in anticipation of Nonlinear least squares.
Note that a default value is given for the third argument y₀
, and it refers to earlier arguments in the list. The reason is that in some contexts, the caller of fdjac
may have already computed y₀
and can supply it without computational cost, while in other contexts, it must be computed fresh. The configuration here adapts to either situation.
Levenberg’s method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
""" levenberg(f, x₁ [;maxiter, ftol, xtol]) Use Levenberg's quasi-Newton iteration to find a root of the system `f` starting from `x₁` Returns the history of root estimates as a vector of vectors. The optional keyword parameters set the maximum number of iterations and the stopping tolerance for values of `f` and changes in `x`. """ function levenberg(f, x₁; maxiter = 40, ftol = 1e-12, xtol = 1e-12) x = [float(x₁)] yₖ = f(x₁) k = 1 s = Inf A = fdjac(f, x[k], yₖ) # start with FD Jacobian jac_is_new = true λ = 10 while (norm(s) > xtol) && (norm(yₖ) > ftol) # Compute the proposed step. B = A' * A + λ * I z = A' * yₖ s = -(B \ z) x̂ = x[k] + s ŷ = f(x̂) # Do we accept the result? if norm(ŷ) < norm(yₖ) # accept λ = λ / 10 # get closer to Newton # Broyden update of the Jacobian. A += (ŷ - yₖ - A * s) * (s' / (s' * s)) jac_is_new = false push!(x, x̂) yₖ = ŷ k += 1 else # don't accept # Get closer to gradient descent. λ = 4λ # Re-initialize the Jacobian if it's out of date. if !jac_is_new A = fdjac(f, x[k], yₖ) jac_is_new = true end end if k == maxiter @warn "Maximum number of iterations reached." break end end return x end
Examples¶
4.1 The rootfinding problem¶
Example 4.1.1
using Plots, SpecialFunctions
J₃(x) = besselj(3, x)
plot(J₃, 0, 20;
title="Bessel function",
xaxis=(L"x"), yaxis=(L"J_3(x)"), grid=:xy)
From the graph we see roots near 6, 10, 13, 16, and 19. We use nlsolve
from the NLsolve
package to find these roots accurately. It uses vector variables, so we have to code accordingly.
Tip
Type \omega
followed by Tab to get the character ω
.
The argument ftol=1e-14
below is called a keyword argument. Here it sets a goal for the maximum value of .
using NLsolve
ω = []
for guess = [6., 10. ,13., 16., 19.]
s = nlsolve(x -> J₃(x[1]), [guess], ftol=1e-14)
append!(ω, s.zero)
end
y = J₃.(ω)
@pt :header=["root estimate", "function value"] [ω y]
scatter!(ω, y, title="Bessel function with roots")
If instead we seek values at which , then we must find roots of the function .
r = []
for guess = [3., 6., 10., 13.]
f(x) = J₃(x[1]) - 0.2
s = nlsolve(f, [guess], ftol=1e-14)
append!(r, s.zero)
end
scatter!(r, J₃.(r), title="Roots and other Bessel values")
Example 4.1.2
Consider first the function
f(x) = (x - 1) * (x - 2);
At the root , we have . If the values of were perturbed at every point by a small amount of noise, we can imagine finding the root of the function drawn with a thick ribbon, giving a range of potential roots.
Tip
The syntax interval...
is called splatting and means to insert all the individual elements of interval
as a sequence.
interval = [0.8, 1.2]
plot(f, interval..., ribbon=0.03, aspect_ratio=1,
xlabel=L"x", yaxis=(L"f(x)", [-0.2, 0.2]))
scatter!([1], [0], title="Well-conditioned root")
The possible values for a perturbed root all lie within the interval where the ribbon intersects the -axis. The width of that zone is about the same as the vertical thickness of the ribbon.
By contrast, consider the function
f(x) = (x - 1) * (x - 1.01);
Now , and the graph of will be much shallower near . Look at the effect this has on our thick rendering:
plot(f, interval..., ribbon=0.03, aspect_ratio=1,
xlabel=L"x", yaxis=(L"f(x)", [-0.2, 0.2]))
scatter!([1], [0], title="Poorly-conditioned root")
The vertical displacements in this picture are exactly the same as before. But the potential horizontal displacement of the root is much wider. In fact, if we perturb the function entirely upward by the amount drawn here, the root disappears!
4.2 Fixed-point iteration¶
Example 4.2.1
Let’s convert the roots of a quadratic polynomial to a fixed point problem.
using Polynomials
p = Polynomial([3.5, -4,1])
r = roots(p)
rmin, rmax = extrema(r)
@show rmin, rmax;
(rmin, rmax) = (1.2928932188134525, 2.7071067811865475)
We define .
g(x) = x - p(x)
g (generic function with 1 method)
Intersections of with the line are fixed points of and thus roots of . (Only one is shown in the chosen plot range.)
using Plots
plt = plot([g x->x], 2, 3;
l=2, label=[L"y=g(x)" L"y=x"],
xlabel=L"x", ylabel=L"y",
aspect_ratio=1,
title="Finding a fixed point", legend=:bottomright)
If we evaluate , we get a value of almost 2.6, so this is not a fixed point.
x = 2.1;
y = g(x)
2.59
However, is considerably closer to the fixed point at around 2.7 than is. Suppose then that we adopt as our new value. Changing the coordinate in this way is the same as following a horizontal line over to the graph of .
plot!([x, y], [y, y], arrow=true, color=3)
Now we can compute a new value for . We leave alone here, so we travel along a vertical line to the graph of .
x = y; y = g(x)
plot!([x, x], [x, y], arrow=true, color=4)
You see that we are in a position to repeat these steps as often as we like. Let’s apply them a few times and see the result.
for k = 1:5
plot!([x, y], [y, y], color=3);
x = y # y becomes the new x
y = g(x) # g(x) becomes the new y
plot!([x, x], [x, y], color=4)
end
plt
The process spirals in beautifully toward the fixed point we seek. Our last estimate has almost 4 accurate digits.
abs(y - rmax) / rmax
0.0001653094344995643
Now let’s try to find the other fixed point in the same way. We’ll use 1.3 as a starting approximation.
plt = plot([g x->x], 1, 2, l=2, label=["y=g(x)" "y=x"], aspect_ratio=1,
xlabel=L"x", ylabel=L"y", title="Divergence", legend=:bottomright)
x = 1.3; y = g(x);
arrow = false
for k = 1:5
plot!([x, y], [y, y], arrow=arrow, color=3)
x = y # y --> new x
y = g(x) # g(x) --> new y
plot!([x, x], [x, y], arrow=arrow, color=4)
if k > 2; arrow = true; end
end
plt
This time, the iteration is pushing us away from the correct answer.
Example 4.2.3
We revisit Demo 4.2.1 and investigate the observed convergence more closely. Recall that above we calculated at the convergent fixed point.
p = Polynomial([3.5, -4, 1])
r = roots(p)
rmin, rmax = extrema(r)
@show rmin, rmax;
(rmin, rmax) = (1.2928932188134525, 2.7071067811865475)
Here is the fixed-point iteration. This time we keep track of the whole sequence of approximations.
g(x) = x - p(x)
x = [2.1]
for k = 1:12
push!(x, g(x[k]))
end
x
13-element Vector{Float64}:
2.1
2.59
2.7419000000000002
2.69148439
2.713333728386328
2.7044887203327885
2.7081843632566587
2.7066592708954196
2.7072919457529734
2.7070300492259465
2.707138558717502
2.707093617492436
2.7071122335938966
It’s illuminating to construct and plot the sequence of errors.
err = @. abs(x - rmax)
plot(0:12, err;
m=:o,
xaxis=("iteration number"), yaxis=("error", :log10),
title="Convergence of fixed-point iteration")
It’s quite clear that the convergence quickly settles into a linear rate. We could estimate this rate by doing a least-squares fit to a straight line. Keep in mind that the values for small should be left out of the computation, as they don’t represent the linear trend.
y = log.(err[5:12])
p = Polynomials.fit(5:12, y, 1)
We can exponentiate the slope to get the convergence constant σ.
σ = exp(p.coeffs[2])
0.4144851385485472
The error should therefore decrease by a factor of σ at each iteration. We can check this easily from the observed data.
[err[i+1] / err[i] for i in 8:11]
4-element Vector{Float64}:
0.4137660520817109
0.4143987269383
0.4141368304124451
0.4142453399049934
The methods for finding σ agree well.
4.3 Newton’s method¶
Example 4.3.1
Suppose we want to find a root of the function
f(x) = x * exp(x) - 2
using Plots
plot(f, 0, 1.5;
label="function", legend=:topleft,
grid=:y, ylim=[-2, 4], xlabel=L"x", ylabel=L"y")
From the graph, it is clear that there is a root near . So we call that our initial guess, .
x₁ = 1
y₁ = f(x₁)
scatter!([x₁], [y₁], label="initial point")
Next, we can compute the tangent line at the point , using the derivative.
df_dx(x) = exp(x) * (x + 1)
m₁ = df_dx(x₁)
tangent = x -> y₁ + m₁ * (x - x₁)
plot!(tangent, 0, 1.5, l=:dash, label="tangent line",
title="Tangent line approximation")
In lieu of finding the root of itself, we settle for finding the root of the tangent line approximation, which is trivial. Call this , our next approximation to the root.
@show x₂ = x₁ - y₁ / m₁
scatter!([x₂], [0], label="tangent root", title="First iteration")
y₂ = f(x₂)
0.06716266657572145
The residual (i.e., value of ) is smaller than before, but not zero. So we repeat the process with a new tangent line based on the latest point on the curve.
plot(f, 0.82, 0.87;
label="function", legend=:topleft,
xlabel=L"x", ylabel=L"y",
title="Second iteration")
scatter!([x₂], [y₂], label="starting point")
m₂ = df_dx(x₂)
tangent = x -> y₂ + m₂ * (x - x₂)
plot!(tangent, 0.82, 0.87; l=:dash, label="tangent line")
@show x₃ = x₂ - y₂ / m₂
scatter!([x₃], [0], label="tangent root")
y₃ = f(x₃)
0.0007730906446230534
Judging by the residual, we appear to be getting closer to the true root each time.
Example 4.3.2
We again look at finding a solution of near . To apply Newton’s method, we need to calculate values of both the residual function and its derivative.
f(x) = x * exp(x) - 2;
df_dx(x) = exp(x) * (x + 1);
We don’t know the exact root, so we use nlsolve
to determine a proxy for it.
using NLsolve
r = nlsolve(x -> f(x[1]), [1.0]).zero
1-element Vector{Float64}:
0.852605502013726
We use as a starting guess and apply the iteration in a loop, storing the sequence of iterates in a vector.
x = [1; zeros(4)]
for k = 1:4
x[k+1] = x[k] - f(x[k]) / df_dx(x[k])
end
x
5-element Vector{Float64}:
1.0
0.8678794411714423
0.8527833734164099
0.8526055263689221
0.852605502013726
Here is the sequence of errors.
ϵ = @. x - r
5-element Vector{Float64}:
0.14739449798627402
0.015273939157716354
0.00017787140268388235
2.435519608212644e-8
0.0
Because the error reaches machine epsilon so rapidly, we’re going to use extended precision to allow us to take a few more iterations. We’ll take the last iteration as the most accurate root estimate.
Tip
A BigFloat
uses 256 bits of precision, rather than 53 in Float64
. But arithmetic is done by software emulation and is much slower.
x = [BigFloat(1); zeros(7)]
for k = 1:7
x[k+1] = x[k] - f(x[k]) / df_dx(x[k])
end
r = x[end]
0.8526055020137254913464724146953174668984533001514035087721073946525150656742605
ϵ = @. Float64(x[1:end-1] - r)
7-element Vector{Float64}:
0.14739449798627452
0.01527393915771683
0.00017787140268443004
2.435519656311045e-8
4.56680051680793e-16
1.6056572825272187e-31
1.9848810119594387e-62
The exponents in the scientific notation definitely suggest a squaring sequence. We can check the evolution of the ratio in (4.3.9).
logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [1:7 ϵ logerr ratios]
The clear convergence to 2 above constitutes good evidence of quadratic convergence.
Example 4.3.3
Suppose we want to evaluate the inverse of the function . This means solving for when is given, which has no elementary form. If a value of is given numerically, though, we simply have a rootfinding problem for .
Tip
The enumerate
function produces a pair of values for each iteration: a positional index and the corresponding contents.
g(x) = exp(x) - x
dg_dx(x) = exp(x) - 1
y = range(g(0), g(2), 200)
x = zeros(length(y))
for (i, y) in enumerate(y)
f(x) = g(x) - y
df_dx(x) = dg_dx(x)
r = FNC.newton(f, df_dx, y)
x[i] = r[end]
end
plot(g, 0, 2, aspect_ratio=1, label=L"g(x)")
plot!(y, x, label=L"g^{-1}(y)", title="Function and its inverse")
plot!(x -> x, 0, maximum(y), label="", l=(:dash, 1), color=:black)
4.4 Interpolation-based methods¶
Example 4.4.1
We return to finding a root of the equation .
using Plots
f(x) = x * exp(x) - 2;
plot(f, 0.25, 1.25;
label="function", legend=:topleft,
xlabel=L"x", ylabel=L"y")
From the graph, it’s clear that there is a root near . To be more precise, there is a root in the interval . So let us take the endpoints of that interval as two initial approximations.
x₁ = 1;
y₁ = f(x₁);
x₂ = 0.5;
y₂ = f(x₂);
scatter!([x₁, x₂], [y₁, y₂];
label="initial points",
title="Two initial values")
Instead of constructing the tangent line by evaluating the derivative, we can construct a linear model function by drawing the line between the two points and . This is called a secant line.
m₂ = (y₂ - y₁) / (x₂ - x₁)
secant = x -> y₂ + m₂ * (x - x₂)
plot!(secant, 0.25, 1.25, label="secant line", l=:dash, color=:black,
title="Secant line")
As before, the next root estimate in the iteration is the root of this linear model.
x₃ = x₂ - y₂ / m₂
@show y₃ = f(x₃)
scatter!([x₃], [0], label="root of secant", title="First iteration")
For the next linear model, we use the line through the two most recent points. The next iterate is the root of that secant line, and so on.
m₃ = (y₃ - y₂) / (x₃ - x₂)
x₄ = x₃ - y₃ / m₃
0.8656319273409482
Example 4.4.2
We check the convergence of the secant method from Demo 4.4.1. Again we will use extended precision to get a longer sequence than double precision allows.
f(x) = x * exp(x) - 2
x = FNC.secant(f, BigFloat(1), BigFloat(0.5), xtol=1e-80, ftol=1e-80);
We don’t know the exact root, so we use the last value as a proxy.
r = x[end]
0.8526055020137254913464724146953174668984533001514035087721073946525150656742605
Here is the sequence of errors.
ϵ = @. Float64(r - x[1:end-2])
11-element Vector{Float64}:
-0.14739449798627452
0.3526055020137255
0.04223372706144885
-0.013026425327222755
0.00042747994131549927
4.269915586133851e-6
-1.4054770126368277e-9
4.620323656624992e-15
4.999480931132388e-24
-1.7783862252641536e-38
6.845099610444838e-62
It’s not easy to see the convergence rate by staring at these numbers. We can use (4.4.8) to try to expose the superlinear convergence rate.
logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [eachindex(ϵ) ϵ logerr ratios]
As expected, this settles in at around 1.618.
Example 4.4.3
Here we look for a root of that is close to 1.
f(x) = x + cos(10 * x)
interval = [0.5, 1.5]
plot(f, interval..., label="Function", legend=:bottomright,
grid=:y, ylim=[-0.1, 3], xlabel=L"x", ylabel=L"y")
We choose three values to get the iteration started.
x = [0.8, 1.2, 1]
y = @. f(x)
scatter!(x, y, label="initial points")
If we were using forward interpolation, we would ask for the polynomial interpolant of as a function of . But that parabola has no real roots.
using Polynomials
q = Polynomials.fit(x, y, 2) # interpolating polynomial
plot!(x -> q(x), interval..., l=:dash, label="interpolant")
To do inverse interpolation, we swap the roles of and in the interpolation.
```{tip}
:class: dropdown
By giving two functions in the plot call, we get the parametric plot $(q(y),y)$ as a function of $y$.
```
```{code-cell}
plot(f, interval..., label="Function",
legend=:bottomright, grid=:y, xlabel=L"x", ylabel=L"y")
scatter!(x, y, label="initial points")
q = Polynomials.fit(y, x, 2) # interpolating polynomial
plot!(y -> q(y), y -> y, -0.1, 2.6, l=:dash, label="inverse interpolant")
```
We seek the value of $x$ that makes $y$ zero. This means evaluating $q$ at zero.
```{code-cell}
q(0)
```
Let's restart the process with `BigFloat` numbers to get a convergent sequence.
```{code-cell}
x = BigFloat.([8, 12, 10]) / 10
y = @. f(x)
for k = 3:12
q = Polynomials.fit(y[k-2:k], x[k-2:k], 2)
push!(x, q(0))
push!(y, f(x[k+1]))
end
println("residual = $(f(x[end]))")
```
As far as our current precision is concerned, we have an exact root.
```{code-cell}
r = x[end]
ϵ = @. Float64(abs(r - x[1:end-1]))
logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [eachindex(ϵ) ϵ logerr ratios]
```
The convergence is probably superlinear at a rate of $\alpha=1.8$ or so.
4.5 Newton for nonlinear systems¶
Example 4.5.3
A system of nonlinear equations is defined by its residual and Jacobian.
Tip
Be careful when coding a Jacobian all in one statement. Spaces separate columns, so x[3]-1
is not the same as x[3] - 1
.
function func(x)
[exp(x[2] - x[1]) - 2,
x[1] * x[2] + x[3],
x[2] * x[3] + x[1]^2 - x[2]
]
end;
function jac(x)
[
-exp(x[2] - x[1]) exp(x[2] - x[1]) 0
x[2] x[1] 1
2*x[1] x[3]-1 x[2]
]
end;
We will use a BigFloat
starting value, and commensurately small stopping tolerances, in order to get a sequence long enough to measure convergence.
x₁ = BigFloat.([0, 0, 0])
ϵ = eps(BigFloat)
x = FNC.newtonsys(func, jac, x₁, xtol=ϵ, ftol=ϵ);
Let’s compute the residual of the last result in order to check the quality.
r = x[end]
@show residual = norm(func(r));
residual = norm(func(r)) = 0.0
We take the sequence of norms of errors, applying the log so that we can look at the exponents.
logerr = [Float64(log(norm(r - x[k]))) for k in 1:length(x)-1]
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "log error", "ratio"] [eachindex(logerr) logerr ratios]
The ratio is neatly converging toward 2, which is expected for quadratic convergence.
4.6 Quasi-Newton methods¶
Example 4.6.1
To solve a nonlinear system, we need to code only the function defining the system, and not its Jacobian.
f(x) =
[
exp(x[2] - x[1]) - 2,
x[1] * x[2] + x[3],
x[2] * x[3] + x[1]^2 - x[2]
]
f (generic function with 1 method)
In all other respects usage is the same as for the newtonsys
function.
x₁ = [0.0, 0.0, 0.0]
x = FNC.levenberg(f, x₁)
12-element Vector{Vector{Float64}}:
[0.0, 0.0, 0.0]
[-0.08396946536317919, 0.07633587873004255, 0.0]
[-0.42205075841965206, 0.21991260740534585, 0.012997569823167984]
[-0.48610710938504953, 0.2138968287772044, 0.09771872586402451]
[-0.45628390809556546, 0.24211047709245145, 0.10100440258901365]
[-0.4556388336696561, 0.23470443548745376, 0.10854665717226099]
[-0.45839614510679244, 0.2353095686241835, 0.10739828073307472]
[-0.45804340381597397, 0.2351212406112955, 0.10768079583159752]
[-0.45803332584412787, 0.23511390840121466, 0.10768998049540802]
[-0.45803327880719313, 0.23511389867393448, 0.10768999250671268]
[-0.4580332805601996, 0.2351138998630789, 0.10768999097568899]
[-0.458033280641234, 0.23511389991865284, 0.10768999090414473]
It’s always a good idea to check the accuracy of the root, by measuring the residual (backward error).
r = x[end]
println("backward error = $(norm(f(r)))")
backward error = 1.2707848769787674e-13
Looking at the convergence in norm, we find a convergence rate between linear and quadratic, like with the secant method.
logerr = [log(norm(r - x[k])) for k in 1:length(x)-1]
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "log error", "ratio"] [eachindex(logerr) logerr ratios]
4.7 Nonlinear least squares¶
Example 4.7.1
We will observe the convergence of Function 4.6.3 for different levels of the minimum least-squares residual. We start with a function mapping from into , and a point that will be near the optimum.
g(x) = [sin(x[1] + x[2]), cos(x[1] - x[2]), exp(x[1] - x[2])]
p = [1, 1];
The function obviously has a zero residual at . We’ll make different perturbations of that function in order to create nonzero residuals.
Tip
@sprintf
is a way to format numerical values as strings, patterned after the C function printf
.
using Printf
plt = plot(xlabel="iteration", yaxis=(:log10, "error"),
title="Convergence of Gauss–Newton")
for R in [1e-3, 1e-2, 1e-1]
# Define the perturbed function.
f(x) = g(x) - g(p) + R * normalize([-1, 1, -1])
x = FNC.levenberg(f, [0, 0])
r = x[end]
err = [norm(x - r) for x in x[1:end-1]]
normres = norm(f(r))
plot!(err, label=@sprintf("R=%.2g", normres))
end
plt
In the least perturbed case, where the minimized residual is less than 10-3, the convergence is plausibly quadratic. At the next level up, the convergence starts similarly but suddenly stagnates for a long time. In the most perturbed case, the quadratic phase is nearly gone and the overall shape looks linear.
Example 4.7.2
m = 25;
s = range(0.05, 6, length=m)
ŵ = @. 2 * s / (0.5 + s) # exactly on the curve
w = @. ŵ + 0.15 * cos(2 * exp(s / 16) * s); # smooth noise added
scatter(s, w, label="noisy data",
xlabel="s", ylabel="v", leg=:bottomright)
plot!(s, ŵ, l=:dash, color=:black, label="perfect data")
The idea is to pretend that we know nothing of the origins of this data and use nonlinear least squares to find the parameters in the theoretical model function . In (4.7.2), the variable plays the role of , and plays the role of .
Tip
Putting comma-separated values on the left of an assignment will destructure the right-hand side, drawing individual assignments from entries of a vector, for example.
function misfit(x)
V, Km = x # rename components for clarity
return @. V * s / (Km + s) - w
end
misfit (generic function with 1 method)
In the Jacobian the derivatives are with respect to the parameters in .
function misfitjac(x)
V, Km = x # rename components for clarity
J = zeros(m, 2)
J[:, 1] = @. s / (Km + s) # dw/dV
J[:, 2] = @. -V * s / (Km + s)^2 # dw/d_Km
return J
end
misfitjac (generic function with 1 method)
x₁ = [1, 0.75]
x = FNC.newtonsys(misfit, misfitjac, x₁)
@show V, Km = x[end] # final values
(V, Km) = x[end] = [1.968652598378229, 0.4693037307416775]
2-element Vector{Float64}:
1.968652598378229
0.4693037307416775
The final values are reasonably close to the values , that we used to generate the noise-free data. Graphically, the model looks close to the original data.
model(s) = V * s / (Km + s)
plot!(model, 0, 6, label="nonlinear fit")
For this particular model, we also have the option of linearizing the fit process. Rewrite the model as
for the new fitting parameters and . This corresponds to the misfit function whose entries are
for . Although this misfit is nonlinear in and , it’s linear in the unknown parameters α and β. This lets us pose and solve it as a linear least-squares problem.
A = [s .^ (-1) s .^ 0]
u = 1 ./ w
α, β = A \ u
2-element Vector{Float64}:
0.12476333709901538
0.5713959100431231
The two fits are different because they do not optimize the same quantities.
linmodel(x) = 1 / (β + α / x)
plot!(linmodel, 0, 6, label="linearized fit")
The truly nonlinear fit is clearly better in this case. It optimizes a residual for the original measured quantity rather than a transformed one we picked for algorithmic convenience.