Skip to article frontmatterSkip to article content

Efficiency of matrix computations

Predicting how long an algorithm will take to solve a particular problem, on a particular computer, as written in a particular way in a particular programming language, is an enormously difficult undertaking. It’s more practical to predict how the required time will scale as a function of the size of the problem. In the case of a linear system of equations, the problem size is nn, the number of equations and variables in the system. Because expressions of computational time are necessarily approximate, it’s customary to suppress all but the term that is dominant as nn\to\infty. We first need to build some terminology for these expressions.

2.5.1Asymptotic analysis

One immediate consequence is that fgf\sim g implies f=O(g)f=O(g).[1]

It’s conventional to use asymptotic notation that is as specific as possible. For instance, while it is true that n2+n=O(n10)n^2+n=O(n^{10}), it’s more informative, and usually expected, to say n2+n=O(n2)n^2+n=O(n^2). There are additional notations that enforce this requirement strictly, but we will just stick to the informal understanding.

There is a memorable way to use asymptotic notation to simplify sums:

k=1nkn22=O(n2), as n,k=1nk2n33=O(n3), as n,k=1nkpnp+1p+1=O(np+1), as n.\begin{split} \sum_{k=1}^n k&\sim \frac{n^2}{2} = O(n^2), \text{ as $n\to\infty$}, \\ \sum_{k=1}^n k^2 &\sim \frac{n^3}{3} = O(n^3), \text{ as $n\to\infty$}, \\ &\vdots \\ \sum_{k=1}^n k^p &\sim \frac{n^{p+1}}{p+1} = O(n^{p+1}), \text{ as $n\to\infty$}. \end{split}

These formulas greatly resemble the definite integral of xpx^p.

2.5.2Flop counting

Traditionally, in numerical linear algebra we count floating-point operations, or flops for short. In our interpretation each scalar addition, subtraction, multiplication, division, and square root counts as one flop. Given any algorithm, we simply add up the number of scalar flops and ignore everything else.

Suppose that the running time tt of an algorithm obeys a function that is O(np)O(n^p). For sufficiently large nn, tCnpt\approx Cn^p for a constant CC should be a good approximation. Hence

tCnplogtp(logn)+logC. t \approx Cn^p \qquad \Longleftrightarrow \qquad \log t \approx p(\log n) + \log C.

So we expect that a graph of logt\log t as a function of logn\log n will be a straight line of slope pp.

2.5.3Solution of linear systems

Recall the steps of Algorithm 2.4.2 for the system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}:

  1. Factor LU=A\mathbf{L}\mathbf{U}=\mathbf{A} using Gaussian elimination.
  2. Solve Lz=b\mathbf{L}\mathbf{z}=\mathbf{b} for z\mathbf{z} using forward substitution.
  3. Solve Ux=z\mathbf{U}\mathbf{x}=\mathbf{z} for x\mathbf{x} using backward substitution.

The second and third steps are solved by Function 2.3.1 and Function 2.3.2. Only one line in each of these functions dominates the arithmetic. Take forwardsub, for instance. It has a single flop in line 11. Line 13 computes

sum( L[i,j]*x[j] for j in 1:i-1 )

This line requires i1i-1 multiplications and (i2)(i-2) additions, for a total of 2i32i-3 flops. Line 14 adds two more flops. These lines are performed within a loop as ii ranges from 1 to nn, so the total count is

1+i=1n(2i3)=13n+2i=1ni. 1 + \sum_{i=1}^n (2i-3) = 1 - 3n + 2 \sum_{i=1}^n i.

It is not hard to find an exact formula for the sum, but we use (2.5.5) to simplify it to n2\sim n^2. After all, since flop counting is only an approximation of true running time, why bother with the more complicated exact expression? An analysis of backward substitution yields the same result.

Before counting flops for the LU factorization, we have to admit that Function 2.4.1 is not written as economically as it could be. Recall from our motivating example in Demo 2.4.3 that we zero out the first row and column of A\mathbf{A} with the first outer product, the second row and column with the second outer product, and so on. There is no good reason to do multiplications and additions with values known to be zero, so we could replace lines 15–19 of lufact with

15
16
17
18
19
for k in 1:n-1
    U[k,k:n] = Aₖ[k,k:n]
    L[k:n,k] = Aₖ[k:n,k]/U[k,k]
    Aₖ[k:n,k:n] -= L[k:n,k]*U[k,k:n]'
end

We will use the following handy fact.

Line 17 above divides each element of the vector Aₖ[k:n,k] by a scalar. Hence the number of flops equals the length of the vector, which is nk+1n-k+1.

Line 18 has an outer product followed by a matrix subtraction. The definition (5) of the outer product makes it clear that that computation takes one flop (multiplication) per element of the result, which here results in (nk+1)2(n-k+1)^2 flops. The number of subtractions is identical.

Altogether the factorization takes

k=1n1nk+1+2(nk+1)2.\sum_{k=1}^{n-1} n-k + 1 + 2(n-k+1)^2.

There are different ways to simplify this expression. We will make a change of summation index using j=nkj=n-k. The endpoints of the sum are j=n1j=n-1 when k=1k=1 and j=1j=1 when k=n1k=n-1. Since the order of terms in a sum doesn’t matter, we get

j=1n11+j+2(j+1)2=j=1n13+5j+2j23(n1)+52(n1)2+23(n1)323n3.\begin{align*} \sum_{j=1}^{n-1} 1+j+2(j+1)^2 &= \sum_{j=1}^{n-1} 3 + 5j + 2j^2 \\ & \sim 3(n-1) + \frac{5}{2}(n-1)^2 + \frac{2}{3}(n-1)^3 \\ & \sim \frac{2}{3}n^3. \end{align*}

We have proved the following.

In practice, flops are not the only aspect of an implementation that occupies significant time. Our position is that counting flops as a measure of performance is a useful oversimplification. We will assume that LU factorization (and as a result, the solution of a linear system of nn equations) requires a real-world time that is roughly O(n3)O(n^3). This growth rate is a great deal more tolerable than, say, O(2n)O(2^n), but it does mean that for (at this writing) nn greater than 10,000 or so, something other than general LU factorization will have to be used.

2.5.4Exercises

  1. ✍ The following are asymptotic assertions about the limit nn\rightarrow\infty. In each case, prove the statement true or false.

    (a) n2=O(logn),n^2 = O(\log n),\quad (b) na=O(nb)n^{a} = O(n^b) if ab,a\le b,\quad (c) ene2n,e^n \sim e^{2n},\quad (d) n+nn+2nn+\sqrt{n}\sim n+2\sqrt{n}.

  2. ✍ The following are asymptotic assertions about the limit h0h\to 0. In each case, prove the statement true or false.

    (a) h2log(h)=O(h3),h^2\log(h) = O(h^3),\quad (b) ha=O(hb)h^{a} = O(h^b) if a<b,a < b,\quad (c) sin(h)h,\sin(h) \sim h,\quad (d) (e2h1)h(e^{2h}-1)\sim h.

  3. ✍ Show that the inner product of two nn-vectors takes exactly 2n12n-1 flops.

  4. ✍ Show that the multiplication of two n×nn\times n matrices takes 2n3\sim 2n^3 flops.

  5. ✍ This problem is about evaluation of a polynomial c1+c2x++cnxn1c_1 + c_2 x + \cdots + c_{n}x^{n-1}.

    (a) Here is a little code to do the evaluation.

    y = c[1]
    xpow = 1
    for i in 2:n
        xpow *= x
        y += c[i]*xpow
    end

    Assuming that x is a scalar, how many flops does this function take, as a function of nn?

    (b) Compare the count from (a) to the flop count for Horner’s algorithm, Function 1.3.1.

  6. The exact sums for p=1,2p=1,2 in (2.5.5) are as follows:

    k=1nk=n(n+1)2,k=1nk2=n(n+1)(2n+1)6.\sum_{k=1}^{n} k = \frac{n(n+1)}{2}, \qquad \sum_{k=1}^{n} k^2 = \frac{n(n+1)(2n+1)}{6}.

    (a) ✍ Use these to find the exact result for (2.5.9).

    (b) ⌨ Plot the ratio of your result from (a) and the asymptotic result 2n3/32n^3/3 for all n=101+0.03in=10^{1+0.03i}, i=0,,100i=0,\dots,100, using a log scale for nn and a linear scale for the ratio. (The curve should approach 1 asymptotically.)

  7. ✍ Show that for any nonnegative constant integer mm,

    k=0nmkpnp+1p+1.\sum_{k=0}^{n-m} k^p \sim \frac{n^{p+1}}{p+1}.
  8. ⌨ The UpperTriangular and LowerTriangular matrix types cause specialized algorithms to be invoked by the backslash. Define

    A = rand(1000,1000)
    B = tril(A)
    C = LowerTriangular(B)
    b = rand(1000)

    Using @elapsed with the backslash solver, time how long it takes to solve the linear system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} 100 times; then, do the same for matrices B\mathbf{B} and C\mathbf{C}. Is the timing for B\mathbf{B} closer to A\mathbf{A} or to C\mathbf{C}? (Hint: Remember to make one timing run without recording results, so that compilation time is not counted.)

Footnotes
  1. More precisely, O(g)O(g) and g\sim g are sets of functions, and g\sim g is a subset of O(g)O(g). That we write f=O(g)f=O(g) rather than fO(g)f\in O(g) is a quirk of convention.