Skip to article frontmatterSkip to article content

Chapter 4

Functions

Newton’s method
newton.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
"""
    newton(f, df_dx, x₁ [;maxiter,f tol, xtol])

Use Newton's method to find a root of `f` starting from `x₁`, where
`df_dx` is the derivative of `f`. Returns a vector of root estimates.

The optional keyword parameters set the maximum number of iterations
and the stopping tolerance for values of `f` and changes in `x`.
"""
function newton(f, df_dx, x₁; maxiter = 40, ftol = 1e-13, xtol = 1e-13)
    x = [float(x₁)]
    y = f(x₁)
    Δx = Inf   # for initial pass below
    k = 1

    while (abs(Δx) > xtol) && (abs(y) > ftol)
        dy_dx = df_dx(x[k])
        Δx = -y / dy_dx            # Newton step
        push!(x, x[k] + Δx)        # append new estimate

        k += 1
        y = f(x[k])
        if k == maxiter
            @warn "Maximum number of iterations reached."
            break   # exit loop
        end
    end
    return x
end
Secant method
secant.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
"""
    secant(f, x₁, x₂ [;maxiter, ftol, xtol])

Use the secant method to find a root of `f` starting from `x₁` and
`x₂`. Returns a vector of root estimates.

The optional keyword parameters set the maximum number of iterations
and the stopping tolerance for values of `f` and changes in `x`.
"""
function secant(f, x₁, x₂; maxiter = 40, ftol = 1e-13, xtol = 1e-13)
    x = [float(x₁), float(x₂)]
    y₁ = f(x₁)
    Δx, y₂ = Inf, Inf   # for initial pass in the loop below
    k = 2

    while (abs(Δx) > xtol) && (abs(y₂) > ftol)
        y₂ = f(x[k])
        Δx = -y₂ * (x[k] - x[k-1]) / (y₂ - y₁)   # secant step
        push!(x, x[k] + Δx)        # append new estimate

        k += 1
        y₁ = y₂    # current f-value becomes the old one next time

        if k == maxiter
            @warn "Maximum number of iterations reached."
            break   # exit loop
        end
    end
    return x
end
Newton’s method for systems
newtonsys.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"""
    newtonsys(f, jac, x₁ [;maxiter, ftol, xtol])

Use Newton's method to find a root of a system of equations,
starting from `x₁`. The functions `f` and `jac` should return the
residual vector and the Jacobian matrix, respectively. Returns the
history of root estimates as a vector of vectors.

The optional keyword parameters set the maximum number of iterations
and the stopping tolerance for values of `f` and changes in `x`.

"""
function newtonsys(f, jac, x₁; maxiter = 40, ftol = 1e-13, xtol = 1e-13)
    x = [float(x₁)]
    y, J = f(x₁), jac(x₁)
    Δx = Inf   # for initial pass below
    k = 1

    while (norm(Δx) > xtol) && (norm(y) > ftol)
        Δx = -(J \ y)             # Newton step
        push!(x, x[k] + Δx)    # append to history
        k += 1
        y, J = f(x[k]), jac(x[k])

        if k == maxiter
            @warn "Maximum number of iterations reached."
            break
        end
    end
    return x
end
Finite differences for Jacobian
fdjac.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
"""
    fdjac(f, x₀ [,y₀])

Compute a finite-difference approximation of the Jacobian matrix for
`f` at `x₀`, where `y₀`=`f(x₀)` may be given.
"""
function fdjac(f, x₀, y₀ = f(x₀))
    δ = sqrt(eps()) * max(norm(x₀), 1)    # near-optimum FD step size
    m, n = length(y₀), length(x₀)
    if n == 1
        # Vector result for univariate functions.
        J = (f(x₀ + δ) - y₀) / δ
    else
        J = zeros(m, n)
        x = copy(x₀)
        for j in 1:n
            # Difference in the jth direction.
            x[j] += δ
            J[:, j] = (f(x) - y₀) / δ
            x[j] -= δ
        end
    end
    return J
end
Levenberg’s method
levenberg.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
"""
    levenberg(f, x₁ [;maxiter, ftol, xtol])

Use Levenberg's quasi-Newton iteration to find a root of the system
`f` starting from `x₁` Returns the history of root estimates as a
vector of vectors.

The optional keyword parameters set the maximum number of iterations
and the stopping tolerance for values of `f` and changes in `x`.

"""
function levenberg(f, x₁; maxiter = 40, ftol = 1e-12, xtol = 1e-12)
    x = [float(x₁)]
    yₖ = f(x₁)
    k = 1
    s = Inf
    A = fdjac(f, x[k], yₖ)   # start with FD Jacobian
    jac_is_new = true

    λ = 10
    while (norm(s) > xtol) && (norm(yₖ) > ftol)
        # Compute the proposed step.
        B = A' * A + λ * I
        z = A' * yₖ
        s = -(B \ z)

        x̂ = x[k] + s
        ŷ = f(x̂)

        # Do we accept the result?
        if norm(ŷ) < norm(yₖ)    # accept
            λ = λ / 10   # get closer to Newton
            # Broyden update of the Jacobian.
            A += (ŷ - yₖ - A * s) * (s' / (s' * s))
            jac_is_new = false

            push!(x, x̂)
            yₖ = ŷ
            k += 1
        else    # don't accept
            # Get closer to gradient descent.
            λ = 4λ
            # Re-initialize the Jacobian if it's out of date.
            if !jac_is_new
                A = fdjac(f, x[k], yₖ)
                jac_is_new = true
            end
        end

        if k == maxiter
            @warn "Maximum number of iterations reached."
            break
        end
    end
    return x
end

Examples

4.1 The rootfinding problem

Example 4.1.1
using Plots, SpecialFunctions
J₃(x) = besselj(3, x)
plot(J₃, 0, 20;
    title="Bessel function",
    xaxis=(L"x"),  yaxis=(L"J_3(x)"),  grid=:xy)
Loading...

From the graph we see roots near 6, 10, 13, 16, and 19. We use nlsolve from the NLsolve package to find these roots accurately. It uses vector variables, so we have to code accordingly.

using NLsolve
ω = []
for guess = [6., 10. ,13., 16., 19.]
    s = nlsolve(x -> J₃(x[1]), [guess], ftol=1e-14)
    append!(ω, s.zero)
end
y = J₃.(ω)
@pt :header=["root estimate", "function value"] [ω y]
Loading...
scatter!(ω, y, title="Bessel function with roots")
Loading...

If instead we seek values at which J3(x)=0.2J_3(x)=0.2, then we must find roots of the function J3(x)0.2J_3(x)-0.2.

r = []
for guess = [3., 6., 10., 13.]
    f(x) = J₃(x[1]) - 0.2
    s = nlsolve(f, [guess], ftol=1e-14)
    append!(r, s.zero)
end
scatter!(r, J₃.(r), title="Roots and other Bessel values")
Loading...
Example 4.1.2

Consider first the function

f(x) = (x - 1) * (x - 2);

At the root r=1r=1, we have f(r)=1f'(r)=-1. If the values of ff were perturbed at every point by a small amount of noise, we can imagine finding the root of the function drawn with a thick ribbon, giving a range of potential roots.

interval = [0.8, 1.2]

plot(f, interval..., ribbon=0.03, aspect_ratio=1,
    xlabel=L"x", yaxis=(L"f(x)", [-0.2, 0.2]))

scatter!([1], [0], title="Well-conditioned root")
Loading...

The possible values for a perturbed root all lie within the interval where the ribbon intersects the xx-axis. The width of that zone is about the same as the vertical thickness of the ribbon.

By contrast, consider the function

f(x) = (x - 1) * (x - 1.01);

Now f(1)=0.01f'(1)=-0.01, and the graph of ff will be much shallower near x=1x=1. Look at the effect this has on our thick rendering:

plot(f, interval..., ribbon=0.03, aspect_ratio=1,
    xlabel=L"x", yaxis=(L"f(x)", [-0.2, 0.2]))

scatter!([1], [0], title="Poorly-conditioned root")
Loading...

The vertical displacements in this picture are exactly the same as before. But the potential horizontal displacement of the root is much wider. In fact, if we perturb the function entirely upward by the amount drawn here, the root disappears!

4.2 Fixed-point iteration

Example 4.2.1

Let’s convert the roots of a quadratic polynomial f(x)f(x) to a fixed point problem.

using Polynomials
p = Polynomial([3.5, -4,1])
r = roots(p)
rmin, rmax = extrema(r)
@show rmin, rmax;
(rmin, rmax) = (1.2928932188134525, 2.7071067811865475)

We define g(x)=xp(x)g(x)=x-p(x).

g(x) = x - p(x)
g (generic function with 1 method)

Intersections of y=g(x)y=g(x) with the line y=xy=x are fixed points of gg and thus roots of ff. (Only one is shown in the chosen plot range.)

using Plots
plt = plot([g x->x], 2, 3;
    l=2, label=[L"y=g(x)" L"y=x"],
    xlabel=L"x",  ylabel=L"y", 
    aspect_ratio=1,
    title="Finding a fixed point",  legend=:bottomright)
Loading...

If we evaluate g(2.1)g(2.1), we get a value of almost 2.6, so this is not a fixed point.

x = 2.1;
y = g(x)
2.59

However, y=g(x)y=g(x) is considerably closer to the fixed point at around 2.7 than xx is. Suppose then that we adopt yy as our new xx value. Changing the xx coordinate in this way is the same as following a horizontal line over to the graph of y=xy=x.

plot!([x, y], [y, y], arrow=true, color=3)
Loading...

Now we can compute a new value for yy. We leave xx alone here, so we travel along a vertical line to the graph of gg.

x = y;  y = g(x)
plot!([x, x], [x, y], arrow=true, color=4)
Loading...

You see that we are in a position to repeat these steps as often as we like. Let’s apply them a few times and see the result.

for k = 1:5
    plot!([x, y], [y, y], color=3);  
    x = y       # y becomes the new x
    y = g(x)    # g(x) becomes the new y
    plot!([x, x], [x, y], color=4)  
end
plt
Loading...

The process spirals in beautifully toward the fixed point we seek. Our last estimate has almost 4 accurate digits.

abs(y - rmax) / rmax
0.0001653094344995643

Now let’s try to find the other fixed point 1.29\approx 1.29 in the same way. We’ll use 1.3 as a starting approximation.

plt = plot([g x->x], 1, 2, l=2, label=["y=g(x)" "y=x"], aspect_ratio=1, 
    xlabel=L"x", ylabel=L"y", title="Divergence", legend=:bottomright)

x = 1.3; y = g(x);
arrow = false
for k = 1:5
    plot!([x, y], [y, y], arrow=arrow, color=3)  
    x = y       # y --> new x
    y = g(x)    # g(x) --> new y
    plot!([x, x], [x, y], arrow=arrow, color=4)
    if k > 2; arrow = true; end
end
plt
Loading...

This time, the iteration is pushing us away from the correct answer.

Example 4.2.3

We revisit Demo 4.2.1 and investigate the observed convergence more closely. Recall that above we calculated g(p)0.42g'(p)\approx-0.42 at the convergent fixed point.

p = Polynomial([3.5, -4, 1])
r = roots(p)
rmin, rmax = extrema(r)
@show rmin, rmax;
(rmin, rmax) = (1.2928932188134525, 2.7071067811865475)

Here is the fixed-point iteration. This time we keep track of the whole sequence of approximations.

g(x) = x - p(x)
x = [2.1]
for k = 1:12
    push!(x, g(x[k]))
end
x
13-element Vector{Float64}: 2.1 2.59 2.7419000000000002 2.69148439 2.713333728386328 2.7044887203327885 2.7081843632566587 2.7066592708954196 2.7072919457529734 2.7070300492259465 2.707138558717502 2.707093617492436 2.7071122335938966

It’s illuminating to construct and plot the sequence of errors.

err = @. abs(x - rmax)
plot(0:12, err;
    m=:o,
    xaxis=("iteration number"),  yaxis=("error", :log10),
    title="Convergence of fixed-point iteration")
Loading...

It’s quite clear that the convergence quickly settles into a linear rate. We could estimate this rate by doing a least-squares fit to a straight line. Keep in mind that the values for small kk should be left out of the computation, as they don’t represent the linear trend.

y = log.(err[5:12])
p = Polynomials.fit(5:12, y, 1)
Loading...

We can exponentiate the slope to get the convergence constant σ.

σ = exp(p.coeffs[2])
0.4144851385485472

The error should therefore decrease by a factor of σ at each iteration. We can check this easily from the observed data.

[err[i+1] / err[i] for i in 8:11]
4-element Vector{Float64}: 0.4137660520817109 0.4143987269383 0.4141368304124451 0.4142453399049934

The methods for finding σ agree well.

4.3 Newton’s method

Example 4.3.1

Suppose we want to find a root of the function

f(x) = x * exp(x) - 2
using Plots
plot(f, 0, 1.5; 
    label="function",  legend=:topleft,
    grid=:y,  ylim=[-2, 4],  xlabel=L"x",  ylabel=L"y")
Loading...

From the graph, it is clear that there is a root near x=1x=1. So we call that our initial guess, x1x_1.

x₁ = 1
y₁ = f(x₁)
scatter!([x₁], [y₁], label="initial point")
Loading...

Next, we can compute the tangent line at the point (x1,f(x1))\bigl(x_1,f(x_1)\bigr), using the derivative.

df_dx(x) = exp(x) * (x + 1)
m₁ = df_dx(x₁)
tangent = x -> y₁ + m₁ * (x - x₁)

plot!(tangent, 0, 1.5, l=:dash, label="tangent line",
    title="Tangent line approximation")
Loading...

In lieu of finding the root of ff itself, we settle for finding the root of the tangent line approximation, which is trivial. Call this x2x_2, our next approximation to the root.

@show x₂ = x₁ - y₁ / m₁
scatter!([x₂], [0], label="tangent root", title="First iteration")
Loading...
y₂ = f(x₂)
0.06716266657572145

The residual (i.e., value of ff) is smaller than before, but not zero. So we repeat the process with a new tangent line based on the latest point on the curve.

plot(f, 0.82, 0.87;
    label="function",  legend=:topleft,
    xlabel=L"x",  ylabel=L"y",
    title="Second iteration")

scatter!([x₂], [y₂], label="starting point")

m₂ = df_dx(x₂)
tangent = x -> y₂ + m₂ * (x - x₂)
plot!(tangent, 0.82, 0.87; l=:dash, label="tangent line")

@show x₃ = x₂ - y₂ / m₂
scatter!([x₃], [0], label="tangent root")
Loading...
y₃ = f(x₃)
0.0007730906446230534

Judging by the residual, we appear to be getting closer to the true root each time.

Example 4.3.2

We again look at finding a solution of xex=2x e^x=2 near x=1x=1. To apply Newton’s method, we need to calculate values of both the residual function ff and its derivative.

f(x) = x * exp(x) - 2;
df_dx(x) = exp(x) * (x + 1);

We don’t know the exact root, so we use nlsolve to determine a proxy for it.

using NLsolve
r = nlsolve(x -> f(x[1]), [1.0]).zero
1-element Vector{Float64}: 0.852605502013726

We use x1=1x_1=1 as a starting guess and apply the iteration in a loop, storing the sequence of iterates in a vector.

x = [1; zeros(4)]
for k = 1:4
    x[k+1] = x[k] - f(x[k]) / df_dx(x[k])
end
x
5-element Vector{Float64}: 1.0 0.8678794411714423 0.8527833734164099 0.8526055263689221 0.852605502013726

Here is the sequence of errors.

ϵ = @. x - r
5-element Vector{Float64}: 0.14739449798627402 0.015273939157716354 0.00017787140268388235 2.435519608212644e-8 0.0

Because the error reaches machine epsilon so rapidly, we’re going to use extended precision to allow us to take a few more iterations. We’ll take the last iteration as the most accurate root estimate.

x = [BigFloat(1); zeros(7)]
for k = 1:7
    x[k+1] = x[k] - f(x[k]) / df_dx(x[k])
end
r = x[end]
0.8526055020137254913464724146953174668984533001514035087721073946525150656742605
ϵ = @. Float64(x[1:end-1] - r)
7-element Vector{Float64}: 0.14739449798627452 0.01527393915771683 0.00017787140268443004 2.435519656311045e-8 4.56680051680793e-16 1.6056572825272187e-31 1.9848810119594387e-62

The exponents in the scientific notation definitely suggest a squaring sequence. We can check the evolution of the ratio in (4.3.9).

logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [1:7 ϵ logerr ratios]
Loading...

The clear convergence to 2 above constitutes good evidence of quadratic convergence.

Example 4.3.3

Suppose we want to evaluate the inverse of the function h(x)=exxh(x)=e^x-x. This means solving y=exxy=e^x-x for xx when yy is given, which has no elementary form. If a value of yy is given numerically, though, we simply have a rootfinding problem for f(x)=exxyf(x)=e^x-x-y.

g(x) = exp(x) - x
dg_dx(x) = exp(x) - 1
y = range(g(0), g(2), 200)
x = zeros(length(y))
for (i, y) in enumerate(y)
    f(x) = g(x) - y
    df_dx(x) = dg_dx(x)
    r = FNC.newton(f, df_dx, y)
    x[i] = r[end]
end

plot(g, 0, 2, aspect_ratio=1, label=L"g(x)")
plot!(y, x, label=L"g^{-1}(y)", title="Function and its inverse")
plot!(x -> x, 0, maximum(y), label="", l=(:dash, 1), color=:black)
Loading...

4.4 Interpolation-based methods

Example 4.4.1

We return to finding a root of the equation xex=2x e^x=2.

using Plots
f(x) = x * exp(x) - 2;
plot(f, 0.25, 1.25;
    label="function",  legend=:topleft,
    xlabel=L"x",  ylabel=L"y")
Loading...

From the graph, it’s clear that there is a root near x=1x=1. To be more precise, there is a root in the interval [0.5,1][0.5,1]. So let us take the endpoints of that interval as two initial approximations.

x₁ = 1;
y₁ = f(x₁);
x₂ = 0.5;
y₂ = f(x₂);
scatter!([x₁, x₂], [y₁, y₂];
    label="initial points",
    title="Two initial values")
Loading...

Instead of constructing the tangent line by evaluating the derivative, we can construct a linear model function by drawing the line between the two points (x1,f(x1))\bigl(x_1,f(x_1)\bigr) and (x2,f(x2))\bigl(x_2,f(x_2)\bigr). This is called a secant line.

m₂ = (y₂ - y₁) / (x₂ - x₁)
secant = x -> y₂ + m₂ * (x - x₂)
plot!(secant, 0.25, 1.25, label="secant line", l=:dash, color=:black,
    title="Secant line")
Loading...

As before, the next root estimate in the iteration is the root of this linear model.

x₃ = x₂ - y₂ / m₂
@show y₃ = f(x₃)
scatter!([x₃], [0], label="root of secant", title="First iteration")
Loading...

For the next linear model, we use the line through the two most recent points. The next iterate is the root of that secant line, and so on.

m₃ = (y₃ - y₂) / (x₃ - x₂)
x₄ = x₃ - y₃ / m₃
0.8656319273409482
Example 4.4.2

We check the convergence of the secant method from Demo 4.4.1. Again we will use extended precision to get a longer sequence than double precision allows.

f(x) = x * exp(x) - 2
x = FNC.secant(f, BigFloat(1), BigFloat(0.5), xtol=1e-80, ftol=1e-80);

We don’t know the exact root, so we use the last value as a proxy.

r = x[end]
0.8526055020137254913464724146953174668984533001514035087721073946525150656742605

Here is the sequence of errors.

ϵ = @. Float64(r - x[1:end-2])
11-element Vector{Float64}: -0.14739449798627452 0.3526055020137255 0.04223372706144885 -0.013026425327222755 0.00042747994131549927 4.269915586133851e-6 -1.4054770126368277e-9 4.620323656624992e-15 4.999480931132388e-24 -1.7783862252641536e-38 6.845099610444838e-62

It’s not easy to see the convergence rate by staring at these numbers. We can use (4.4.8) to try to expose the superlinear convergence rate.

logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [eachindex(ϵ) ϵ logerr ratios]
Loading...

As expected, this settles in at around 1.618.

Example 4.4.3

Here we look for a root of x+cos(10x)x+\cos(10x) that is close to 1.

f(x) = x + cos(10 * x)
interval = [0.5, 1.5]

plot(f, interval..., label="Function", legend=:bottomright,
    grid=:y, ylim=[-0.1, 3], xlabel=L"x", ylabel=L"y")
Loading...

We choose three values to get the iteration started.

x = [0.8, 1.2, 1]
y = @. f(x)
scatter!(x, y, label="initial points")
Loading...

If we were using forward interpolation, we would ask for the polynomial interpolant of yy as a function of xx. But that parabola has no real roots.

using Polynomials
q = Polynomials.fit(x, y, 2)      # interpolating polynomial
plot!(x -> q(x), interval..., l=:dash, label="interpolant")
Loading...

To do inverse interpolation, we swap the roles of xx and yy in the interpolation.

```{tip}
:class: dropdown
By giving two functions in the plot call, we get the parametric plot $(q(y),y)$ as a function of $y$.
```

```{code-cell}
plot(f, interval..., label="Function",
    legend=:bottomright, grid=:y, xlabel=L"x", ylabel=L"y")
scatter!(x, y, label="initial points")

q = Polynomials.fit(y, x, 2)       # interpolating polynomial
plot!(y -> q(y), y -> y, -0.1, 2.6, l=:dash, label="inverse interpolant")
```

We seek the value of $x$ that makes $y$ zero. This means evaluating $q$ at zero.

```{code-cell}
q(0)
```

Let's restart the process with `BigFloat` numbers to get a convergent sequence.

```{code-cell}
x = BigFloat.([8, 12, 10]) / 10
y = @. f(x)

for k = 3:12
    q = Polynomials.fit(y[k-2:k], x[k-2:k], 2)
    push!(x, q(0))
    push!(y, f(x[k+1]))
end

println("residual = $(f(x[end]))")
```

As far as our current precision is concerned, we have an exact root.

```{code-cell}
r = x[end]
ϵ = @. Float64(abs(r - x[1:end-1]))
logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [eachindex(ϵ) ϵ logerr ratios]
```

The convergence is probably superlinear at a rate of $\alpha=1.8$ or so.

4.5 Newton for nonlinear systems

Example 4.5.3

A system of nonlinear equations is defined by its residual and Jacobian.

function func(x)
    [exp(x[2] - x[1]) - 2,
        x[1] * x[2] + x[3],
        x[2] * x[3] + x[1]^2 - x[2]
    ]
end;

function jac(x)
    [
        -exp(x[2] - x[1]) exp(x[2] - x[1]) 0
        x[2] x[1] 1
        2*x[1] x[3]-1 x[2]
    ]
end;

We will use a BigFloat starting value, and commensurately small stopping tolerances, in order to get a sequence long enough to measure convergence.

x₁ = BigFloat.([0, 0, 0])
ϵ = eps(BigFloat)
x = FNC.newtonsys(func, jac, x₁, xtol=ϵ, ftol=ϵ);

Let’s compute the residual of the last result in order to check the quality.

r = x[end]
@show residual = norm(func(r));
residual = norm(func(r)) = 0.0

We take the sequence of norms of errors, applying the log so that we can look at the exponents.

logerr = [Float64(log(norm(r - x[k]))) for k in 1:length(x)-1]
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "log error", "ratio"] [eachindex(logerr) logerr ratios]
Loading...

The ratio is neatly converging toward 2, which is expected for quadratic convergence.

4.6 Quasi-Newton methods

Example 4.6.1

To solve a nonlinear system, we need to code only the function defining the system, and not its Jacobian.

f(x) = 
    [
        exp(x[2] - x[1]) - 2,
        x[1] * x[2] + x[3],
        x[2] * x[3] + x[1]^2 - x[2]
    ]
f (generic function with 1 method)

In all other respects usage is the same as for the newtonsys function.

x₁ = [0.0, 0.0, 0.0]
x = FNC.levenberg(f, x₁)
12-element Vector{Vector{Float64}}: [0.0, 0.0, 0.0] [-0.08396946536317919, 0.07633587873004255, 0.0] [-0.42205075841965206, 0.21991260740534585, 0.012997569823167984] [-0.48610710938504953, 0.2138968287772044, 0.09771872586402451] [-0.45628390809556546, 0.24211047709245145, 0.10100440258901365] [-0.4556388336696561, 0.23470443548745376, 0.10854665717226099] [-0.45839614510679244, 0.2353095686241835, 0.10739828073307472] [-0.45804340381597397, 0.2351212406112955, 0.10768079583159752] [-0.45803332584412787, 0.23511390840121466, 0.10768998049540802] [-0.45803327880719313, 0.23511389867393448, 0.10768999250671268] [-0.4580332805601996, 0.2351138998630789, 0.10768999097568899] [-0.458033280641234, 0.23511389991865284, 0.10768999090414473]

It’s always a good idea to check the accuracy of the root, by measuring the residual (backward error).

r = x[end]
println("backward error = $(norm(f(r)))")
backward error = 1.2707848769787674e-13

Looking at the convergence in norm, we find a convergence rate between linear and quadratic, like with the secant method.

logerr = [log(norm(r - x[k])) for k in 1:length(x)-1]
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "log error", "ratio"] [eachindex(logerr) logerr ratios]
Loading...

4.7 Nonlinear least squares

Example 4.7.1

We will observe the convergence of Function 4.6.3 for different levels of the minimum least-squares residual. We start with a function mapping from R2\real^2 into R3\real^3, and a point that will be near the optimum.

g(x) = [sin(x[1] + x[2]), cos(x[1] - x[2]), exp(x[1] - x[2])]
p = [1, 1];

The function g(x)g(p)\mathbf{g}(\mathbf{x}) - \mathbf{g}(\mathbf{p}) obviously has a zero residual at p\mathbf{p}. We’ll make different perturbations of that function in order to create nonzero residuals.

using Printf
plt = plot(xlabel="iteration", yaxis=(:log10, "error"),
    title="Convergence of Gauss–Newton")
for R in [1e-3, 1e-2, 1e-1]
    # Define the perturbed function.
    f(x) = g(x) - g(p) + R * normalize([-1, 1, -1])
    x = FNC.levenberg(f, [0, 0])
    r = x[end]
    err = [norm(x - r) for x in x[1:end-1]]
    normres = norm(f(r))
    plot!(err, label=@sprintf("R=%.2g", normres))
end
plt
Loading...

In the least perturbed case, where the minimized residual is less than 10-3, the convergence is plausibly quadratic. At the next level up, the convergence starts similarly but suddenly stagnates for a long time. In the most perturbed case, the quadratic phase is nearly gone and the overall shape looks linear.

Example 4.7.2
m = 25;
s = range(0.05, 6, length=m)
ŵ = @. 2 * s / (0.5 + s)                      # exactly on the curve
w = @. ŵ + 0.15 * cos(2 * exp(s / 16) * s);     # smooth noise added
scatter(s, w, label="noisy data",
    xlabel="s", ylabel="v", leg=:bottomright)
plot!(s, ŵ, l=:dash, color=:black, label="perfect data")
Loading...

The idea is to pretend that we know nothing of the origins of this data and use nonlinear least squares to find the parameters in the theoretical model function v(s)v(s). In (4.7.2), the ss variable plays the role of tt, and vv plays the role of gg.

function misfit(x)
    V, Km = x   # rename components for clarity
    return @. V * s / (Km + s) - w
end
misfit (generic function with 1 method)

In the Jacobian the derivatives are with respect to the parameters in x\mathbf{x}.

function misfitjac(x)
    V, Km = x   # rename components for clarity
    J = zeros(m, 2)
    J[:, 1] = @. s / (Km + s)              # dw/dV
    J[:, 2] = @. -V * s / (Km + s)^2         # dw/d_Km
    return J
end
misfitjac (generic function with 1 method)
x₁ = [1, 0.75]
x = FNC.newtonsys(misfit, misfitjac, x₁)

@show V, Km = x[end]  # final values
(V, Km) = x[end] = [1.968652598378229, 0.4693037307416775]

2-element Vector{Float64}: 1.968652598378229 0.4693037307416775

The final values are reasonably close to the values V=2V=2, Km=0.5K_m=0.5 that we used to generate the noise-free data. Graphically, the model looks close to the original data.

model(s) = V * s / (Km + s)
plot!(model, 0, 6, label="nonlinear fit")
Loading...

For this particular model, we also have the option of linearizing the fit process. Rewrite the model as

1w=αs+β=αs1+β\frac{1}{w} = \frac{\alpha}{s} + \beta = \alpha \cdot s^{-1} + \beta

for the new fitting parameters α=Km/V\alpha=K_m/V and β=1/V\beta=1/V. This corresponds to the misfit function whose entries are

fi([α,β])=(α1si+β)1wif_i([\alpha,\beta]) = \left(\alpha \cdot \frac{1}{s_i} + \beta\right) - \frac{1}{w_i}

for i=1,,mi=1,\ldots,m. Although this misfit is nonlinear in ss and ww, it’s linear in the unknown parameters α and β. This lets us pose and solve it as a linear least-squares problem.

A = [s .^ (-1) s .^ 0]
u = 1 ./ w
α, β = A \ u
2-element Vector{Float64}: 0.12476333709901538 0.5713959100431231

The two fits are different because they do not optimize the same quantities.

linmodel(x) = 1 / (β + α / x)
plot!(linmodel, 0, 6, label="linearized fit")
Loading...

The truly nonlinear fit is clearly better in this case. It optimizes a residual for the original measured quantity rather than a transformed one we picked for algorithmic convenience.