Newton for nonlinear systems

The rootfinding problem becomes much more difficult when multiple variables and equations are involved. We now let \(\mathbf{f}\) be a vector function of a vector argument: \(\mathbf{f}\) maps from \(\mathbb{R}^n\) to \(\mathbb{R}^n\). By \(\mathbf{f}(\mathbf{x})=\boldsymbol{0}\) we mean the simultaneous system of \(n\) scalar equations,

\[\begin{split}\begin{split} f_1(x_1,\dots,x_n) &= 0\\ f_2(x_1,\dots,x_n) &= 0\\ &\vdots\\ f_n(x_1,\dots,x_n) &= 0. \end{split}\end{split}\]

When discussing a specific problem, it is often necessary to write out the equations componentwise, but in discussing methods for the general problem it’s more convenient to use the vector form. Proving the existence and uniqueness of a solution for any particular \(\mathbf{f}\) is typically quite difficult.

Linear model

To extend rootfinding methods to systems, we will keep to the basic philosophy of constructing easily managed models of the exact function. As usual, the starting point is a linear model. We base it on the multidimensional Taylor series,

(90)\[\mathbf{f}(\mathbf{x}+\mathbf{h}) = \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x})\mathbf{h} + O(\| \mathbf{h} \|^2),\]

where \(\mathbf{J}\) is called the Jacobian matrix of \(\mathbf{f}\) and is defined by

(91)\[\begin{split}\mathbf{J}(\mathbf{x}) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n}\\[2mm] \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n}\\[1mm] \vdots & \vdots & & \vdots\\[1mm] \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix} = \left[ \frac{\partial f_i}{\partial x_j} \right]_{i,j=1,\ldots,n.}\end{split}\]

Because of the Jacobian’s role in (90), we may write \(\mathbf{J}(\mathbf{x})\) as \(\mathbf{f}\,'(\mathbf{x})\). Like any derivative, it is a function of the independent variable \(\mathbf{x}\).

Example 37

Let

\[\begin{split}\begin{split} f_1(x_1,x_2,x_3) &= -x_1\cos(x_2) - 1\\ f_2(x_1,x_2,x_3) &= x_1x_2 + x_3\\ f_3(x_1,x_2,x_3) &= e^{-x_3}\sin(x_1+x_2) + x_1^2 - x_2^2. \end{split}\end{split}\]

Then

\[\begin{split} J(x) = \begin{bmatrix} -\cos(x_2) & x_1 \sin(x_2) & 0\\ x_2 & x_1 & 1\\ e^{-x_3}\cos(x_1+x_2)+2x_1 & e^{-x_3}\cos(x_1+x_2)-2x_2 & -e^{-x_3}\sin(x_1+x_2) \end{bmatrix}.\end{split}\]

If we were to start writing out the terms in (90), we would begin with

\[\begin{split}\begin{split} f_1(x_1+h_1,x_2+h_2,x_3+h_3) &= -x_1\cos(x_2)-1 -\cos(x_2)h_1 + x_1\sin(x_2)h_2 + O\bigl(\| \mathbf{h} \|^2\bigr) \\ f_2(x_1+h_1,x_2+h_2,x_3+h_3) &= x_1x_2 + x_3 + x_2h_1 +x_1h_2 + h_3 + O\bigl(\| \mathbf{h} \|^2\bigr), \end{split}\end{split}\]

and so on.

The terms \(\mathbf{f}(\mathbf{x})+\mathbf{J}(\mathbf{x})\mathbf{h}\) in (90) represent the “linear part” of \(\mathbf{f}\) near \(\mathbf{x}\). If \(\mathbf{f}\) is actually linear, i.e., \(\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}-\mathbf{b}\), then the Jacobian matrix is the constant matrix \(\mathbf{A}\) and the higher order terms in (90) disappear.

The multidimensional Newton iteration

With a method in hand for constructing a linear model for the vector system \(\mathbf{f}(\mathbf{x})\), we can generalize Newton’s method. Specifically, at a root estimate \(\mathbf{x}_k\), we set \(\mathbf{h} = \mathbf{x}-\mathbf{x}_k\) in (90) and get

\[\mathbf{f}(\mathbf{x}) \approx \mathbf{q}(\mathbf{x}) = \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x}-\mathbf{x}_k).\]

We define the next iteration value \(\mathbf{x}_{k+1}\) by requiring \(\mathbf{q}(\mathbf{x}_{k+1})=\boldsymbol{0}\),

\[\begin{split}\begin{split} \boldsymbol{0} &= \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x}_{k+1}-\mathbf{x}_k),\\ \end{split}\end{split}\]

which can be rearranged into

(92)\[\mathbf{x}_{k+1} = \mathbf{x}_k - \bigl[\mathbf{J}(\mathbf{x}_k)\bigr]^{-1} \mathbf{f}(\mathbf{x}_k).\]

Note that \(\mathbf{J}^{-1}\mathbf{f}\) now plays the role that \(f/f'\) had in the scalar case; in fact the two are the same in one dimension. In computational practice, however, we don’t compute matrix inverses. Instead, define the Newton step \(\mathbf{s}_k\) by the linear \(n\times n\) system

(93)\[ \mathbf{J}(\mathbf{x}_k)\, \mathbf{s}_k = -\mathbf{f}(\mathbf{x}_k),\]

so that \(\mathbf{x}_{k+1}=\mathbf{x}_k+\mathbf{s}_k\). Computing the Newton step is equivalent to solving a linear system using the Jacobian matrix and the function value.

An extension of our series analysis of the scalar Newton’s method shows that the vector version is also quadratically convergent in any vector norm, under suitable circumstances.

Implementation

An implementation of Newton’s method for systems is given in newtonsys.

Function 38 (newtonsys)

Newton’s method for a system of equations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
"""
newtonsys(f,jac,x1)

Use Newton's method to find a root of a system of equations,
starting from `x1`. The functions `f` and `jac should return the
residual vector and the Jacobian matrix, respectively. Returns
history of root estimates as a vector of vectors.
"""
function newtonsys(f,jac,x1)
    # Operating parameters.
    funtol = 1000*eps();  xtol = 1000*eps();  maxiter = 40;

    x = [float(x1)]
    y,J = f(x1),jac(x1)
    dx = Inf   # for initial pass below
    k = 1

    while (norm(dx) > xtol) && (norm(y) > funtol) && (k < maxiter)
        dx = -(J\y)             # Newton step
        push!(x,x[k] + dx)    # append to history
        k += 1
        y,J = f(x[k]),jac(x[k])
    end

    if k==maxiter
        @warn "Maximum number of iterations reached."
    end

    return x
end

Tip

The output of newtonsys is a vector of vectors representing the entire history of root estimates. Since these are usually floating point, even when the initial estimate is integer, the initial estimate is converted with float in line 12.

It is a remarkable effect of the vector-friendliness of Julia that this program is hardly different from the scalar version newton presented earlier:

  • The root estimates are stored as columns in an array.

  • The Newton step is calculated using a backslash.

  • The function “norm” is used for the magnitude of a vector, instead of “abs” for the magnitude of a scalar.

Indeed, newtonsys is a proper generalization—it can be used on scalar problems as well as on systems.

Exercises

  1. ✍ Suppose that

    \[\begin{split}\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1x_2+x_2^2-1 \\[1mm] x_1x_2^3 + x_1^2x_2^2 + 1 \end{bmatrix}.\end{split}\]

    Let \(\mathbf{x}_1=[-2,1]^T\). Use Newton’s method to find \(\mathbf{x}_2\).

  2. ✍ Suppose that \(\mathbf{f}(\mathbf{x}) = \mathbf{A}\mathbf{x} - \mathbf{b}\) for a constant \(n\times n\) matrix \(\mathbf{A}\) and constant \(n\times 1\) vector \(\mathbf{b}\). Show that Newton’s method converges to the exact root in one iteration.

  3. Two curves in the \((u,v)\) plane are defined implicitly by the equations \(u\log u + v \log v = -0.3\) and \(u^4 + v^2 = 1\).

    (a) ✍ Write the intersection of these curves in the form \(\mathbf{f}(\mathbf{x}) = \boldsymbol{0}\) for two-dimensional \(\mathbf{f}\) and \(\mathbf{x}\).

    (b) ✍ Find the Jacobian matrix of \(\mathbf{f}\).

    (c) ⌨ Use newtonsys to find an intersection point near \(u=1\), \(v=0.1\).

    (d) ⌨ Use newtonsys to find an intersection point near \(u=0.1\), \(v=1\).

  4. Two elliptical orbits \((x_1(s),y_1(s))\) and \((x_2(t),y_2(t))\) are described by the equations

    \[\begin{split}\begin{bmatrix} x_1(t) \\ y_1(t) \end{bmatrix} = \begin{bmatrix} -5+10\cos(t) \\ 6\sin(t) \end{bmatrix}, \qquad \begin{bmatrix} x_2(t)\\y_2(t) \end{bmatrix} = \begin{bmatrix} 8\cos(t) \\ 1+12\sin(t) \end{bmatrix},\end{split}\]

    where \(t\) represents time.

    (a) ⌨ Make a plot of the two orbits in the following code:

    x1(t) = -5+10*cos(t);   y1(t) = 6*sin(t);
    plot(x1,y1,0,2pi,aspect_ratio=1,legend=false)
    x2(t) = 8*cos(t);   y2(t) = 1+12*sin(t);
    plot!(x2,y2,0,2pi)
    

    (b) ✍ Write out a \(2\times 2\) nonlinear system of equations that describes an intersection of these orbits. (Note: An intersection is not the same as a collision—they don’t have to occupy the same point at the same time.)

    (c) ✍ Write out the Jacobian matrix of this nonlinear system.

    (d) ⌨ Use newtonsys to find all of the unique intersections.

  5. ⌨ Suppose one wants to find the points on the ellipsoid \(x^2/25 + y^2/16 + z^2/9 = 1\) that are closest to and farthest from the point \((5,4,3)\). The method of Lagrange multipliers implies that any such point satisfies

    \[\begin{split}\begin{split} x-5 &= \frac{\lambda x}{25} \\ y-4 &= \frac{\lambda y}{16} \\ z-3 &= \frac{\lambda z}{9} \\ 1 &= \frac{1}{25}x^2 + \frac{1}{16}y^2 + \frac{1}{9}z^2 \end{split}\end{split}\]

    for an unknown value of \(\lambda\).

    (a) Write out this system in the form \(\mathbf{f}(\mathbf{u}) = \boldsymbol{0}\). (Note that the system has four variables to go with the four equations.)

    (b) Write out the Jacobian matrix of this system.

    (c) Use newtonsys with different initial guesses to find the two roots of this system. Which is the closest point to \((5,4,3)\) and which is the farthest?

  6. ⌨ In this problem you are to fit a function of the form

    \[P(t) = a_1 + a_2 e^{a_3 t}\]

    to a subset of US census data for the twentieth century:

    Year

    1910

    1930

    1950

    1970

    1990

    Population

    92.0

    122.8

    150.7

    205.0$

    248.7

    (a) Determine the unknown parameters \(a_1\), \(a_2\), \(a_3\) in \(P\) by requiring that \(P\) exactly reproduce the data in the years 1910, 1950, and 1990. This creates three nonlinear equations for \(a_1\), \(a_2\), and \(a_3\) that may be solved using newtonsys.

    (b) To obtain convergence, rescale the data using the time variable \(t = (\text{year}-1900)/100\) and divide the population numbers above by \(100\). Using your model \(P(t)\), predict the result of the 2000 census, and compare it to the true figure of 284.1 million.