Skip to article frontmatterSkip to article content

Numerical integration

In calculus you learn that the elegant way to evaluate a definite integral is to apply the Fundamental Theorem of Calculus and find an antiderivative. The connection is so profound and pervasive that it’s easy to overlook that a definite integral is a numerical quantity existing independently of antidifferentiation. However, most conceivable integrands have no antiderivative in terms of familiar functions.

Numerical integration, which also goes by the older name quadrature, is performed by combining values of the integrand sampled at nodes. In this section we will assume equally spaced nodes using the definitions

ti=a+ih,h=ban,i=0,,n. t_i = a +i h, \quad h=\frac{b-a}{n}, \qquad i=0,\ldots,n.

Numerical integration formulas can be applied to sequences of data values even if no function is explicitly known to generate them. For our presentation and implementations, however, we assume that ff is known and can be evaluated anywhere.

A straightforward way to derive integration formulas is to mimic the approach taken for finite differences: find an interpolant and operate exactly on it.

5.6.1Trapezoid formula

One of the most important integration formulas results from integration of the piecewise linear interpolant (see Piecewise linear interpolation). Using the cardinal basis form of the interpolant in (5.2.3), we have

abf(x)dxabi=0nf(ti)Hi(x)dx=i=0nf(ti)[abHi(x)]dx.\int_a^b f(x) \, dx \approx \int_a^b \sum_{i=0}^n f(t_i) H_i(x)\, dx = \sum_{i=0}^n f(t_i) \left[ \int_a^b H_i(x)\right]\, dx.

Thus we can identify the weights as wi=h1abHi(x)dxw_i = h^{-1} \int_a^b H_i(x)\, dx. Using areas of triangles, it’s trivial to derive that

wi={1,i=1,,n1,12,i=0,n.w_i = \begin{cases} 1, & i=1,\ldots,n-1,\\ \frac{1}{2}, & i=0,n. \end{cases}

Putting everything together, the resulting formula is

abf(x)dxTf(n)=h[12f(t0)+f(t1)+f(t2)++f(tn1)+12f(tn)].\begin{split} \int_a^b f(x)\, dx \approx T_f(n) &= h\left[ \frac{1}{2}f(t_0) + f(t_1) + f(t_2) + \cdots + f(t_{n-1}) + \frac{1}{2}f(t_n) \right]. \end{split}

Geometrically, as illustrated in Figure 5.6.1, the trapezoid formula sums of the areas of trapezoids approximating the region under the curve y=f(x)y=f(x).[1]

The trapezoid formula is the Swiss Army knife of integration formulas. A short implementation is given as Function 5.6.1.

Trapezoid formula for integration. The piecewise linear interpolant defines trapezoids that approximate the region under the curve.

Figure 5.6.1:Trapezoid formula for integration. The piecewise linear interpolant defines trapezoids that approximate the region under the curve.

Like finite-difference formulas, numerical integration formulas have a truncation error.

In Theorem 5.2.2 we stated that the pointwise error in a piecewise linear interpolant with equal node spacing hh is bounded by O(h2)O(h^2) as h0h\rightarrow 0. Using II to stand for the exact integral of ff and pp to stand for the piecewise linear interpolant, we obtain

ITf(n)=Iabp(x)dx=ab[f(x)p(x)]dx(ba)maxx[a,b]f(x)p(x)=O(h2).\begin{split} I - T_f(n) = I - \int_a^b p(x)\, dx &= \int_a^b \bigl[f(x)-p(x)\bigr] \, dx \\ &\le (b-a) \max_{x\in[a,b]} |f(x)-p(x)| = O(h^2). \end{split}

A more thorough statement of the truncation error is known as the Euler–Maclaurin formula,

abf(x)dx=Tf(n)h212[f(b)f(a)]+h4740[f(b)f(a)]+O(h6)=Tf(n)k=1B2kh2k(2k)![f(2k1)(b)f(2k1)(a)],\begin{split} \int_a^b f(x)\, dx &= T_f(n) - \frac{h^2}{12} \left[ f'(b)-f'(a) \right] + \frac{h^4}{740} \left[ f'''(b)-f'''(a) \right] + O(h^6) \\ &= T_f(n) - \sum_{k=1}^\infty \frac{B_{2k}h^{2k}}{(2k)!} \left[ f^{(2k-1)}(b)-f^{(2k-1)}(a) \right], \end{split}

where the B2kB_{2k} are constants known as Bernoulli numbers. Unless we happen to be fortunate enough to have a function with f(b)=f(a)f'(b)=f'(a), we should expect truncation error at second order and no better.

5.6.2Extrapolation

If evaluations of ff are computationally expensive, we want to get as much accuracy as possible from them by using a higher-order formula. There are many routes for doing so; for example, we could integrate a not-a-knot cubic spline interpolant. However, splines are difficult to compute by hand, and as a result different methods were developed before computers came on the scene.

Knowing the structure of the error allows the use of extrapolation to improve accuracy. Suppose a quantity A0A_0 is approximated by an algorithm A(h)A(h) with an error expansion

A0=A(h)+c1h+c2h2+c3h3+. A_0 = A(h) + c_1 h + c_2 h^2 + c_3 h^3 + \cdots.

Crucially, it is not necessary to know the values of the error constants ckc_k, merely that they exist and are independent of hh.

Using II for the exact integral of ff, the trapezoid formula has

I=Tf(n)+c2h2+c4h4+, I = T_f(n) + c_2 h^2 + c_4 h^{4} + \cdots,

as proved by the Euler–Maclaurin formula (5.6.9). The error constants depend on ff and can’t be evaluated in general, but we know that this expansion holds. For convenience we recast the error expansion in terms of n=O(h1)n=O(h^{-1}):

I=Tf(n)+c2n2+c4n4+. I = T_f(n) + c_2 n^{-2} + c_4 n^{-4} + \cdots.

We now make the simple observation that

I=Tf(2n)+14c2n2+116c4n4+. I = T_f(2n) + \tfrac{1}{4} c_2 n^{-2} + \tfrac{1}{16} c_4 n^{-4} + \cdots.

It follows that if we combine (5.6.12) and (5.6.13) correctly, we can cancel out the second-order term in the error. Specifically, define

Sf(2n)=13[4Tf(2n)Tf(n)]. S_f(2n) = \frac{1}{3} \Bigl[ 4 T_f(2n) - T_f(n) \Bigr].

(We associate 2n2n rather than nn with the extrapolated result because of the total number of nodes needed.) Then

I=Sf(2n)+O(n4)=b4n4+b6n6+. I = S_f(2n) + O(n^{-4}) = b_4 n^{-4} + b_6 n^{-6} + \cdots.

The formula (5.6.14) is called Simpson’s formula, or Simpson’s rule. A different presentation and derivation are considered in Exercise 4.

Equation (5.6.15) is another particular error expansion in the form (5.6.10), so we can extrapolate again! The details change only a little. Considering that

I=Sf(4n)=116b4n4+164b6n6+, I = S_f(4n) = \tfrac{1}{16} b_4 n^{-4} + \tfrac{1}{64} b_6 n^{-6} + \cdots,

the proper combination this time is

Rf(4n)=115[16Sf(4n)Sf(2n)], R_f(4n) = \frac{1}{15} \Bigl[ 16 S_f(4n) - S_f(2n) \Bigr],

which is sixth-order accurate. Clearly the process can be repeated to get eighth-order accuracy and beyond. Doing so goes by the name of Romberg integration, which we will not present in full generality.

5.6.3Node doubling

Note in (5.6.17) that Rf(4n)R_f(4n) depends on Sf(2n)S_f(2n) and Sf(4n)S_f(4n), which in turn depend on Tf(n)T_f(n), Tf(2n)T_f(2n), and Tf(4n)T_f(4n). There is a useful benefit realized by doubling of the nodes in each application of the trapezoid formula. As shown in Figure 5.6.2, when doubling nn, only about half of the nodes are new ones, and previously computed function values at the other nodes can be reused.

Dividing the node spacing by half introduces new nodes only at midpoints, allowing the function values at existing nodes to be reused for extrapolation.

Figure 5.6.2:Dividing the node spacing by half introduces new nodes only at midpoints, allowing the function values at existing nodes to be reused for extrapolation.

Specifically, we have

Tf(2m)=12m[12f(a)+12f(b)+i=12m1f(a+i2m)]=12m[12f(a)+12f(b)]+12mk=1m1f(a+2k2m)+12mk=1mf(a+2k12m)=12m[12f(a)+12f(b)+k=1m1f(a+km)]+12mk=1mf(a+2k12m)=12Tf(m)+12mk=1m1f(t2k1),\begin{split} T_f(2m) & = \frac{1}{2m} \left[ \frac{1}{2} f(a) + \frac{1}{2} f(b) + \sum_{i=1}^{2m-1} f\Bigl( a + \frac{i}{2m} \Bigr) \right]\\[1mm] & = \frac{1}{2m} \left[ \frac{1}{2} f(a) + \frac{1}{2} f(b)\right] + \frac{1}{2m} \sum_{k=1}^{m-1} f\Bigl( a+\frac{2k}{2m} \Bigr) + \frac{1}{2m} \sum_{k=1}^{m} f\Bigl( a+\frac{2k-1}{2m} \Bigr) \\[1mm] &= \frac{1}{2m} \left[ \frac{1}{2} f(a) + \frac{1}{2} f(b) + \sum_{k=1}^{m-1} f\Bigl( a+\frac{k}{m} \Bigr) \right] + \frac{1}{2m} \sum_{k=1}^{m} f\Bigl( a+\frac{2k-1}{2m} \Bigr) \\[1mm] &= \frac{1}{2} T_f(m) + \frac{1}{2m} \sum_{k=1}^{m-1} f\left(t_{2k-1} \right), \end{split}

where the nodes referenced in the last line are relative to n=2mn=2m. Hence in passing from n=mn=m to n=2mn=2m, new integrand evaluations are needed only at the odd-numbered nodes of the finer grid.

5.6.4Exercises

  1. ⌨ For each integral below, use Function 5.6.1 to estimate the integral for n=102kn=10\cdot 2^k nodes for k=1,2,,10k=1,2,\ldots,10. Make a log-log plot of the errors and confirm or refute second-order accuracy. (These integrals were taken from Bailey et al. (2005).)

    (a) 01xlog(1+x)dx=14\displaystyle \int_0^1 x\log(1+x)\, dx = \frac{1}{4}

    (b) 01x2tan1xdx=π2+2log212\displaystyle \int_0^1 x^2 \tan^{-1}x\, dx = \frac{\pi-2+2\log 2}{12}

    (c) 0π/2excosxdx=eπ/212\displaystyle \int_0^{\pi/2}e^x \cos x\, dx = \frac{e^{\pi/2}-1}{2}

    (d) 01xlog(x)dx=49\displaystyle \int_0^1 \sqrt{x} \log(x) \, dx = -\frac{4}{9} (Note: Although the integrand has the limiting value zero as x0x\to 0, it cannot be evaluated naively at x=0x=0. You can start the integral at x=ϵmachx=\macheps instead.)

    (e) 011x2dx=π4\displaystyle \int_0^1 \sqrt{1-x^2}\,\, dx = \frac{\pi}{4}

  2. ✍ The Euler–Maclaurin error expansion (5.6.9) for the trapezoid formula implies that if we could cancel out the term due to f(b)f(a)f'(b)-f'(a), we would obtain fourth-order accuracy. We should not assume that ff' is available, but approximating it with finite differences can achieve the same goal. Suppose the forward difference formula (5.4.13) is used for f(a)f'(a), and its reflected backward difference is used for f(b)f'(b). Show that the resulting modified trapezoid formula is

    Gf(h)=Tf(h)h24[3(f(tn)+f(t0))4(f(tn1)+f(t1))+(f(tn2)+f(t2))], G_f(h) = T_f(h) - \frac{h}{24} \left[ 3\Bigl( f(t_n)+f(t_0) \Bigr) -4\Bigr( f(t_{n-1}) + f(t_1) \Bigr) + \Bigl( f(t_{n-2})+f(t_2) \Bigr) \right],

    which is known as a Gregory integration formula.

  3. ⌨ Repeat each integral in Exercise 1 above using Gregory integration (5.6.19) instead of the trapezoid formula. Compare the observed errors to fourth-order convergence.

  1. ✍ Simpson’s formula can be derived without appealing to extrapolation.

    (a) Show that

    p(x)=β+γα2hx+α2β+γ2h2x2p(x) = \beta + \frac{\gamma-\alpha}{2h}\, x + \frac{\alpha-2\beta+\gamma}{2h^2}\, x^2

    interpolates the three points (h,α)(-h,\alpha), (0,β)(0,\beta), and (h,γ)(h,\gamma).

    (b) Find

    hhp(s)ds, \int_{-h}^h p(s)\, ds,

    where pp is the quadratic polynomial from part (a), in terms of hh, α, β, and γ.

    (c) Assume equally spaced nodes in the form ti=a+iht_i=a+ih, for h=(ba)/nh=(b-a)/n and i=0,,ni=0,\ldots,n. Suppose ff is approximated by p(x)p(x) over the subinterval [ti1,ti+1][t_{i-1},t_{i+1}]. Apply the result from part (b) to find

    ti1ti+1f(x)dxh3[f(ti1)+4f(ti)+f(ti+1)]. \int_{t_{i-1}}^{t_{i+1}} f(x)\, dx \approx \frac{h}{3} \bigl[ f(t_{i-1}) + 4f(t_i) + f(t_{i+1}) \bigr].

    (Use the change of variable s=xtis=x-t_i.)

    (d) Now also assume that n=2mn=2m for an integer mm. Derive Simpson’s formula,

    abf(x)dxh3[f(t0)+4f(t1)+2f(t2)+4f(t3)+2f(t4)++2f(tn2)+4f(tn1)+f(tn)]. \begin{split} \int_a^b f(x)\, dx \approx \frac{h}{3}\bigl[ &f(t_0) + 4f(t_1) + 2f(t_2) + 4f(t_3) + 2f(t_4) + \cdots\\ &+ 2f(t_{n-2}) + 4f(t_{n-1}) + f(t_n) \bigr]. \end{split}
  2. ✍ Show that the Simpson formula (5.6.23) is equivalent to Sf(n/2)S_f(n/2), given the definition of SfS_f in (5.6.14).

  3. ⌨ For each integral in Exercise 1 above, apply the Simpson formula (5.6.23) and compare the errors to fourth-order convergence.

  4. ⌨ For n=10,20,30,,200n=10,20,30,\ldots,200, compute the trapezoidal approximation to

    0112.01+sin(6πx)cos(2πx)dx0.9300357672424684.\int_{0}^1 \frac{1}{2.01+\sin (6\pi x)-\cos(2\pi x)} \,d x \approx 0.9300357672424684.

    Make two separate plots of the absolute error as a function of nn, one using a log-log scale and the other using log-linear. The graphs suggest that the error asymptotically behaves as CαnC \alpha^n for some C>0C>0 and some 0<α<10<\alpha<1. How does this result relate to (5.6.9)?

  5. ⌨ For each integral in Exercise 1 above, extrapolate the trapezoidal results two levels to get sixth-order accurate results, and compute the errors for each value.

  6. ✍ Find a formula like (5.6.17) that extrapolates two values of RfR_f to obtain an eighth-order accurate one.

Footnotes
  1. Some texts distinguish between a formula for a single subinterval [tk1,tk][t_{k-1},t_k] and a composite formula that adds them up over the whole interval to get (5.6.5).

References
  1. Bailey, D. H., Jeyabalan, K., & Li, X. S. (2005). A Comparison of Three High-Precision Quadrature Schemes. Experimental Mathematics, 14(3), 317–329. 10.1080/10586458.2005.10128931