Predicting how long an algorithm will take to solve a particular problem, on a particular computer, as written in a particular way in a particular programming language, is an enormously difficult undertaking. It’s more practical to predict how the required time will scale as a function of the size of the problem. In the case of a linear system of equations, the problem size is n n n , the number of equations and variables in the system. Because expressions of computational time are necessarily approximate, it’s customary to suppress all but the term that is dominant as n → ∞ n\to\infty n → ∞ . We first need to build some terminology for these expressions.
2.5.1 Asymptotic analysis ¶ Definition 2.5.1 (Asymptotic notation)
Let f ( n ) f(n) f ( n ) and g ( n ) g(n) g ( n ) be positive-valued functions. We say f ( n ) = O ( g ( n ) ) f(n)=O(g(n)) f ( n ) = O ( g ( n )) (read “f f f is big-O of g g g ”) as n → ∞ n\rightarrow \infty n → ∞ if f ( n ) / g ( n ) f(n)/g(n) f ( n ) / g ( n ) is bounded above as n → ∞ n\to\infty n → ∞ .
We say f ( n ) ∼ g ( n ) f(n)\sim g(n) f ( n ) ∼ g ( n ) (read “f f f is asymptotic to g g g ”) as n → ∞ n\rightarrow \infty n → ∞ if f ( n ) / g ( n ) → 1 f(n)/g(n)\rightarrow 1 f ( n ) / g ( n ) → 1 as n → ∞ n\rightarrow\infty n → ∞ .
One immediate consequence is that f ∼ g f\sim g f ∼ g implies f = O ( g ) f=O(g) f = O ( g ) .[1]
Consider the functions f ( n ) = a 1 n 3 + b 1 n 2 + c 1 n f(n) = a_1 n^3 + b_1 n^2 + c_1 n f ( n ) = a 1 n 3 + b 1 n 2 + c 1 n and g ( n ) = a 2 n 3 g(n) = a_2 n^3 g ( n ) = a 2 n 3 in the limit n → ∞ n\to \infty n → ∞ . Then
lim n → ∞ f ( n ) g ( n ) = lim n → ∞ a 1 + b 1 n − 1 + c 1 n − 2 a 2 = a 1 a 2 . \lim_{n \to \infty} \frac{f(n)}{g(n)}
= \lim_{n \to \infty} \frac{a_1 + b_1n^{-1} + c_1n^{-2}}{a_2} =
\frac{a_1}{a_2} . n → ∞ lim g ( n ) f ( n ) = n → ∞ lim a 2 a 1 + b 1 n − 1 + c 1 n − 2 = a 2 a 1 . Since a 1 / a 2 a_1/a_2 a 1 / a 2 is a constant, f ( n ) = O ( g ( n ) ) f(n) = O(g(n)) f ( n ) = O ( g ( n )) ; if a 1 = a 2 a_1=a_2 a 1 = a 2 , then f ∼ g f \sim g f ∼ g .
Consider f ( n ) = sin ( 1 / n ) f(n) = \sin (1/n) f ( n ) = sin ( 1/ n ) , g ( n ) = 1 / n g(n)=1/n g ( n ) = 1/ n and h ( n ) = 1 / n 2 h(n) = 1/n^2 h ( n ) = 1/ n 2 . For large n n n , Taylor’s theorem with remainder implies that
f ( n ) = 1 n − cos ( 1 / ξ ) 1 6 n 3 , f(n) = \frac{1}{n} - \cos(1/\xi)\frac{1}{6 n^3}, f ( n ) = n 1 − cos ( 1/ ξ ) 6 n 3 1 , where n < ξ < ∞ n<\xi<\infty n < ξ < ∞ . But
lim n → ∞ f g = lim n → ∞ 1 − cos ( 1 / ξ ) 1 6 n 2 = 1 , \lim_{n\to \infty} \frac{f}{g} = \lim_{n\to \infty} 1-\cos(1/\xi)\frac{1}{6 n^2} = 1, n → ∞ lim g f = n → ∞ lim 1 − cos ( 1/ ξ ) 6 n 2 1 = 1 , and so f ∼ g f \sim g f ∼ g . On the other hand, comparing f f f and h h h , we find
lim n → ∞ f h = lim n → ∞ n − cos ( 1 / ξ ) 1 6 n = ∞ , \lim_{n\to \infty} \frac{f}{h} = \lim_{n\to \infty} n-\cos(1/\xi)\frac{1}{6 n} = \infty, n → ∞ lim h f = n → ∞ lim n − cos ( 1/ ξ ) 6 n 1 = ∞ , so we cannot say that f = O ( h ) f = O(h) f = O ( h ) . A consideration of h / f h/f h / f will show that h = O ( f ) h = O(f) h = O ( f ) , however.
It’s conventional to use asymptotic notation that is as specific as possible. For instance, while it is true that n 2 + n = O ( n 10 ) n^2+n=O(n^{10}) n 2 + n = O ( n 10 ) , it’s more informative, and usually expected, to say n 2 + n = O ( n 2 ) n^2+n=O(n^2) n 2 + n = O ( n 2 ) . There are additional notations that enforce this requirement strictly, but we will just stick to the informal understanding.
There is a memorable way to use asymptotic notation to simplify sums:
∑ k = 1 n k ∼ n 2 2 = O ( n 2 ) , as n → ∞ , ∑ k = 1 n k 2 ∼ n 3 3 = O ( n 3 ) , as n → ∞ , ⋮ ∑ k = 1 n k p ∼ n p + 1 p + 1 = O ( n p + 1 ) , as n → ∞ . \begin{split}
\sum_{k=1}^n k&\sim \frac{n^2}{2} = O(n^2), \text{ as $n\to\infty$}, \\
\sum_{k=1}^n k^2 &\sim \frac{n^3}{3} = O(n^3), \text{ as $n\to\infty$}, \\
&\vdots \\
\sum_{k=1}^n k^p &\sim \frac{n^{p+1}}{p+1} = O(n^{p+1}), \text{ as $n\to\infty$}.
\end{split} k = 1 ∑ n k k = 1 ∑ n k 2 k = 1 ∑ n k p ∼ 2 n 2 = O ( n 2 ) , as n → ∞ , ∼ 3 n 3 = O ( n 3 ) , as n → ∞ , ⋮ ∼ p + 1 n p + 1 = O ( n p + 1 ) , as n → ∞ . These formulas greatly resemble the definite integral of x p x^p x p .
∑ k = 1 n − 1 4 k 2 + 3 = 4 ( ∑ k = 1 n − 1 k 2 ) + 3 ∑ k = 1 n − 1 1 ∼ 4 ( 1 3 ( n − 1 ) 3 ) + 3 ( n − 1 ) = 4 3 ( n 3 − 3 n 2 + 3 n − 1 ) + 3 n − 3 ∼ 4 3 n 3 . \begin{align*}
\sum_{k=1}^{n-1} 4k^2 + 3 & = 4 \left( \sum_{k=1}^{n-1} k^2\right) + 3 \sum_{k=1}^{n-1} 1\\
&\sim 4 \left( \frac{1}{3} (n-1)^3 \right) + 3(n-1) \\
& = \frac{4}{3} (n^3 - 3n^2 + 3n - 1) + 3n - 3 \\
&\sim \frac{4}{3} n^3.
\end{align*} k = 1 ∑ n − 1 4 k 2 + 3 = 4 ( k = 1 ∑ n − 1 k 2 ) + 3 k = 1 ∑ n − 1 1 ∼ 4 ( 3 1 ( n − 1 ) 3 ) + 3 ( n − 1 ) = 3 4 ( n 3 − 3 n 2 + 3 n − 1 ) + 3 n − 3 ∼ 3 4 n 3 . 2.5.2 Flop counting ¶ Traditionally, in numerical linear algebra we count floating-point operations , or flops for short. In our interpretation each scalar addition, subtraction, multiplication, division, and square root counts as one flop. Given any algorithm, we simply add up the number of scalar flops and ignore everything else.
Example 2.5.4 (Floating-point operations in matrix-vector multiplication)
Example 2.5.4 Here is a straightforward implementation of matrix-vector multiplication.
n = 6
A = randn(n, n)
x = rand(n)
y = zeros(n)
for i in 1:n
for j in 1:n
y[i] += A[i, j] * x[j] # 1 multiply, 1 add
end
end
Each of the loops implies a summation of flops. The total flop count for this algorithm is
∑ i = 1 n ∑ j = 1 n 2 = ∑ i = 1 n 2 n = 2 n 2 . \sum_{i=1}^n \sum_{j=1}^n 2 = \sum_{i=1}^n 2n = 2n^2. i = 1 ∑ n j = 1 ∑ n 2 = i = 1 ∑ n 2 n = 2 n 2 . Since the matrix A \mathbf{A} A has n 2 n^2 n 2 elements, all of which have to be involved in the product, it seems unlikely that we could get a flop count that is smaller than O ( n 2 ) O(n^2) O ( n 2 ) in general.
Let’s run an experiment with the built-in matrix-vector multiplication. Note that Julia is unusual in that loops have a variable scope separate from its enclosing code. Thus, for n in n
below means that inside the loop, the name n
will take on each one of the values that were previously assigned to the vector n
.
Tip
The push!
function attaches a new value to the end of a vector.
n = 1000:1000:5000
t = []
for n in n
A = randn(n, n)
x = randn(n)
time = @elapsed for j in 1:80; A * x; end
push!(t, time)
end
The reason for doing multiple repetitions at each value of n n n in the loop above is to avoid having times so short that the resolution of the timer is significant.
@pt :header = ["size", "time (sec.)"] [n t]
Looking at the timings just for n = 2000 n=2000 n = 2000 and n = 4000 n=4000 n = 4000 , they have ratio
Tip
The expression n.==4000
here produces a vector of Boolean (true/false) values the same size as n
. This result is used to index within t
, accessing only the value for which the comparison is true.
@show t[n.==4000] ./ t[n.==2000];
t[n .== 4000] ./ t[n .== 2000] = [1.4117809950868896]
If the run time is dominated by flops, then we expect this ratio to be
2 ( 4000 ) 2 2 ( 2000 ) 2 = 4. \frac{2(4000)^2}{2(2000)^2}=4. 2 ( 2000 ) 2 2 ( 4000 ) 2 = 4. Example 2.5.4 Here is a straightforward implementation of matrix-vector multiplication.
n = 6;
A = magic(n);
x = ones(n,1);
y = zeros(n,1);
for i = 1:n
for j = 1:n
y(i) = y(i) + A(i,j)*x(j); % 2 flops
end
end
Each of the loops implies a summation of flops. The total flop count for this algorithm is
∑ i = 1 n ∑ j = 1 n 2 = ∑ i = 1 n 2 n = 2 n 2 . \sum_{i=1}^n \sum_{j=1}^n 2 = \sum_{i=1}^n 2n = 2n^2. i = 1 ∑ n j = 1 ∑ n 2 = i = 1 ∑ n 2 n = 2 n 2 . Since the matrix A \mathbf{A} A has n 2 n^2 n 2 elements, all of which have to be involved in the product, it seems unlikely that we could get a flop count that is smaller than O ( n 2 ) O(n^2) O ( n 2 ) in general.
Let’s run an experiment with the built-in matrix-vector multiplication, using tic
and toc
to time the operation.
n = (500:500:5000)';
t = zeros(size(n));
for k = 1:length(n)
A = randn(n(k), n(k)); x = randn(n(k), 1);
tic % start a timer
for j = 1:200 % repeat 100 times
A*x;
end
time = toc; % read the timer
t(k) = time / 200; % seconds per instance
end
The reason for doing multiple repetitions at each value of n n n in the loop above is to avoid having times so short that the resolution of the timer is significant.
table(n, t, 'variablenames', ["size", "time"])
Looking at the timings just for n = 2500 n=2500 n = 2500 and n = 5000 n=5000 n = 5000 , they have ratio
Tip
The expression n==5000
here produces a vector of Boolean (true/false) values the same size as n
. This result is used to index within t
, accessing only the value for which the comparison is true.
If the run time is dominated by flops, then we expect this ratio to be
2 ( 5000 ) 2 2 ( 2500 ) 2 = 4. \frac{2(5000)^2}{2(2500)^2}=4. 2 ( 2500 ) 2 2 ( 5000 ) 2 = 4. Example 2.5.4 Here is a straightforward implementation of matrix-vector multiplication.
n = 6
A = random.rand(n, n)
x = ones(n)
y = zeros(n)
for i in range(n):
for j in range(n):
y[i] += A[i, j] * x[j] # 2 flops
Each of the loops implies a summation of flops. The total flop count for this algorithm is
∑ i = 1 n ∑ j = 1 n 2 = ∑ i = 1 n 2 n = 2 n 2 . \sum_{i=1}^n \sum_{j=1}^n 2 = \sum_{i=1}^n 2n = 2n^2. i = 1 ∑ n j = 1 ∑ n 2 = i = 1 ∑ n 2 n = 2 n 2 . Since the matrix A \mathbf{A} A has n 2 n^2 n 2 elements, all of which have to be involved in the product, it seems unlikely that we could get a flop count that is smaller than O ( n 2 ) O(n^2) O ( n 2 ) in general.
Let’s run an experiment with the built-in matrix-vector multiplication. We assume that flops dominate the computation time and thus measure elapsed time.
N = 400 * arange(1, 11)
t = []
print(" n t")
for i, n in enumerate(N):
A = random.randn(n, n)
x = random.randn(n)
start = timer()
for j in range(50): A @ x
t.append(timer() - start)
print(f"{n:5} {t[-1]:10.3e}")
The reason for doing multiple repetitions at each value of n n n above is to avoid having times so short that the resolution of the timer is a factor.
Looking at the timings just for n = 2000 n=2000 n = 2000 and n = 4000 n=4000 n = 4000 , they have ratio:
If the run time is dominated by flops, then we expect this ratio to be
2 ( 4000 ) 2 2 ( 2000 ) 2 = 4. \frac{2(4000)^2}{2(2000)^2}=4. 2 ( 2000 ) 2 2 ( 4000 ) 2 = 4. Suppose that the running time t t t of an algorithm obeys a function that is O ( n p ) O(n^p) O ( n p ) . For sufficiently large n n n , t ≈ C n p t\approx Cn^p t ≈ C n p for a constant C C C should be a good approximation. Hence
t ≈ C n p ⟺ log t ≈ p ( log n ) + log C . t \approx Cn^p \qquad \Longleftrightarrow \qquad \log t \approx p(\log n) + \log C. t ≈ C n p ⟺ log t ≈ p ( log n ) + log C . So we expect that a graph of log t \log t log t as a function of log n \log n log n will be a straight line of slope p p p .
Example 2.5.5 (Asymptotics in log-log plots)
Example 2.5.5 Let’s repeat the experiment of the previous figure for more, and larger, values of n n n .
randn(5,5)*randn(5); # throwaway to force compilation
n = 400:200:6000
t = []
for n in n
A = randn(n, n)
x = randn(n)
time = @elapsed for j in 1:50; A * x; end
push!(t, time)
end
Plotting the time as a function of n n n on log-log scales is equivalent to plotting the logs of the variables.
scatter(n, t, label="data", legend=false,
xaxis=(:log10, L"n"), yaxis=(:log10, "elapsed time (sec)"),
title="Timing of matrix-vector multiplications")
You can see that while the full story is complicated, the graph is trending to a straight line of positive slope. For comparison, we can plot a line that represents O ( n 2 ) O(n^2) O ( n 2 ) growth exactly. (All such lines have slope equal to 2.)
plot!(n, t[end] * (n/n[end]).^2, l=:dash,
label=L"O(n^2)", legend=:topleft)
Example 2.5.5 Let’s repeat the previous experiment for more, and larger, values of n n n .
n = (400:400:6000)';
t = zeros(size(n));
for k = 1:length(n)
A = randn(n(k), n(k)); x = randn(n(k), 1);
tic % start a timer
for j = 1:100 % repeat ten times
A*x;
end
time = toc; % read the timer
t(k) = time / 100; % seconds per instance
end
Plotting the time as a function of n n n on log-log scales is equivalent to plotting the logs of the variables.
clf % clear any existing figure
loglog(n, t, 'o-')
xlabel('size of matrix')
ylabel('time (sec)')
title('Timing of matrix-vector multiplications')
You can see that while the full story is complicated, the graph is trending to a straight line of positive slope. For comparison, we can plot a line that represents O ( n 2 ) O(n^2) O ( n 2 ) growth exactly. (All such lines have slope equal to 2.)
hold on
loglog(n, 0.5 * t(end) * (n / n(end)).^2, 'k--')
axis tight
legend('data', 'O(n^2)', 'location', 'southeast')
Example 2.5.5 Let’s repeat the experiment of the previous example for more, and larger, values of n n n .
N = arange(400, 6200, 200)
t = zeros(len(N))
for i, n in enumerate(N):
A = random.randn(n,n)
x = random.randn(n)
start = timer()
for j in range(20): A@x
t[i] = timer() - start
Plotting the time as a function of n n n on log-log scales is equivalent to plotting the logs of the variables, but is formatted more neatly.
fig, ax = subplots()
ax.loglog(N, t, "-o", label="observed")
ylabel("elapsed time (sec)");
xlabel("$n$");
title("Timing of matrix-vector multiplications");
You can see that while the full story is complicated, the graph is trending to a straight line of positive slope. For comparison, we can plot a line that represents O ( n 2 ) O(n^2) O ( n 2 ) growth exactly. (All such lines have slope equal to 2.)
ax.loglog(N, t[-1] * (N/N[-1])**2, "--", label="$O(n^2)$")
ax.legend(); fig
2.5.3 Solution of linear systems ¶ Recall the steps of Algorithm 2.4.2 for the system A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax = b :
Factor L U = A \mathbf{L}\mathbf{U}=\mathbf{A} LU = A using Gaussian elimination. Solve L z = b \mathbf{L}\mathbf{z}=\mathbf{b} Lz = b for z \mathbf{z} z using forward substitution. Solve U x = z \mathbf{U}\mathbf{x}=\mathbf{z} Ux = z for x \mathbf{x} x using backward substitution. The second and third steps are solved by Function 2.3.1 and Function 2.3.2 . Only one line in each of these functions dominates the arithmetic. Take forwardsub
, for instance. It has a single flop in line 11. Line 13 computes
sum( L[i,j]*x[j] for j in 1:i-1 )
This line requires i − 1 i-1 i − 1 multiplications and ( i − 2 ) (i-2) ( i − 2 ) additions, for a total of 2 i − 3 2i-3 2 i − 3 flops. Line 14 adds two more flops. These lines are performed within a loop as i i i ranges from 1 to n n n , so the total count is
1 + ∑ i = 1 n ( 2 i − 3 ) = 1 − 3 n + 2 ∑ i = 1 n i . 1 + \sum_{i=1}^n (2i-3) = 1 - 3n + 2 \sum_{i=1}^n i. 1 + i = 1 ∑ n ( 2 i − 3 ) = 1 − 3 n + 2 i = 1 ∑ n i . It is not hard to find an exact formula for the sum, but we use (2.5.5) to simplify it to ∼ n 2 \sim n^2 ∼ n 2 . After all, since flop counting is only an approximation of true running time, why bother with the more complicated exact expression? An analysis of backward substitution yields the same result.
Before counting flops for the LU factorization, we have to admit that Function 2.4.1 is not written as economically as it could be. Recall from our motivating example in Demo 2.4.3 that we zero out the first row and column of A \mathbf{A} A with the first outer product, the second row and column with the second outer product, and so on. There is no good reason to do multiplications and additions with values known to be zero, so we could replace lines 15–19 of lufact
with
15
16
17
18
19
for k in 1:n-1
U[k,k:n] = Aₖ[k,k:n]
L[k:n,k] = Aₖ[k:n,k]/U[k,k]
Aₖ[k:n,k:n] -= L[k:n,k]*U[k,k:n]'
end
We will use the following handy fact.
Line 17 above divides each element of the vector Aₖ[k:n,k]
by a scalar. Hence the number of flops equals the length of the vector, which is n − k + 1 n-k+1 n − k + 1 .
Line 18 has an outer product followed by a matrix subtraction. The definition (5) of the outer product makes it clear that that computation takes one flop (multiplication) per element of the result, which here results in ( n − k + 1 ) 2 (n-k+1)^2 ( n − k + 1 ) 2 flops. The number of subtractions is identical.
Altogether the factorization takes
∑ k = 1 n − 1 n − k + 1 + 2 ( n − k + 1 ) 2 . \sum_{k=1}^{n-1} n-k + 1 + 2(n-k+1)^2. k = 1 ∑ n − 1 n − k + 1 + 2 ( n − k + 1 ) 2 . There are different ways to simplify this expression. We will make a change of summation index using j = n − k j=n-k j = n − k . The endpoints of the sum are j = n − 1 j=n-1 j = n − 1 when k = 1 k=1 k = 1 and j = 1 j=1 j = 1 when k = n − 1 k=n-1 k = n − 1 . Since the order of terms in a sum doesn’t matter, we get
∑ j = 1 n − 1 1 + j + 2 ( j + 1 ) 2 = ∑ j = 1 n − 1 3 + 5 j + 2 j 2 ∼ 3 ( n − 1 ) + 5 2 ( n − 1 ) 2 + 2 3 ( n − 1 ) 3 ∼ 2 3 n 3 . \begin{align*}
\sum_{j=1}^{n-1} 1+j+2(j+1)^2 &= \sum_{j=1}^{n-1} 3 + 5j + 2j^2 \\
& \sim 3(n-1) + \frac{5}{2}(n-1)^2 + \frac{2}{3}(n-1)^3 \\
& \sim \frac{2}{3}n^3.
\end{align*} j = 1 ∑ n − 1 1 + j + 2 ( j + 1 ) 2 = j = 1 ∑ n − 1 3 + 5 j + 2 j 2 ∼ 3 ( n − 1 ) + 2 5 ( n − 1 ) 2 + 3 2 ( n − 1 ) 3 ∼ 3 2 n 3 . We have proved the following.
Theorem 2.5.1 (Efficiency of LU factorization)
The LU factorization of an n × n n\times n n × n matrix takes ∼ 2 3 n 3 \sim\frac{2}{3}n^3 ∼ 3 2 n 3 flops as n → ∞ n\to \infty n → ∞ . This dominates the flops for solving an n × n n\times n n × n linear system.
Example 2.5.6 (Floating-point operations in LU factorization)
Example 2.5.6 We’ll test the conclusion of O ( n 3 ) O(n^3) O ( n 3 ) flops experimentally, using the built-in lu
function instead of the purely instructive lufact
.
Tip
The first time a function is invoked, there may be significant time needed to compile it in memory. Thus, when timing a function, run it at least once before beginning the timing.
lu(randn(3, 3)); # throwaway to force compilation
n = 400:400:4000
t = []
for n in n
A = randn(n, n)
time = @elapsed for j in 1:12; lu(A); end
push!(t, time)
end
We plot the timings on a log-log graph and compare it to O ( n 3 ) O(n^3) O ( n 3 ) . The result could vary significantly from machine to machine, but in theory the data should start to parallel the line as n → ∞ n\to\infty n → ∞ .
scatter(n, t, label="data", legend=:topleft,
xaxis=(:log10, L"n"), yaxis=(:log10, "elapsed time"))
plot!(n, t[end ]* (n/n[end]).^3, l=:dash, label=L"O(n^3)")
Example 2.5.6 We’ll test the conclusion of O ( n 3 ) O(n^3) O ( n 3 ) flops experimentally, using the built-in lu
function instead of the purely instructive lufact
.
Tip
The first time a function is invoked, there may be significant time needed to compile it in memory. Thus, when timing a function, run it at least once before beginning the timing.
n = (200:100:2400)';
t = zeros(size(n));
for k = 1:length(n)
A = randn(n(k), n(k));
tic % start a timer
for j = 1:6, [L, U] = lu(A); end
time = toc;
t(k) = time / 6;
end
We plot the timings on a log-log graph and compare it to O ( n 3 ) O(n^3) O ( n 3 ) . The result could vary significantly from machine to machine, but in theory the data should start to parallel the line as n → ∞ n\to\infty n → ∞ .
clf
loglog(n, t, 'o-')
hold on
loglog(n, 0.5 * t(end) * (n/n(end)).^3, 'k--')
axis tight
xlabel('size of matrix'), ylabel('time (sec)')
title('Timing of LU factorization')
legend('lu','O(n^3)','location','southeast');
Example 2.5.6 We’ll test the conclusion of O ( n 3 ) O(n^3) O ( n 3 ) flops experimentally using the lu
function imported from scipi.linalg
.
from scipy.linalg import lu
N = arange(200, 2600, 200)
t = zeros(len(N))
for i, n in enumerate(N):
A = random.randn(n,n)
start = timer()
for j in range(5): lu(A)
t[i] = timer() - start
We plot the timings on a log-log graph and compare it to O ( n 3 ) O(n^3) O ( n 3 ) . The result could vary significantly from machine to machine, but in theory the data should start to parallel the line as n → ∞ n\to\infty n → ∞ .
loglog(N, t, "-o", label="obseved")
loglog(N, t[-1] * (N / N[-1])**3, "--", label="$O(n^3)$")
legend();
xlabel("$n$");
ylabel("elapsed time (sec)");
title("Timing of LU factorizations");
In practice, flops are not the only aspect of an implementation that occupies significant time. Our position is that counting flops as a measure of performance is a useful oversimplification. We will assume that LU factorization (and as a result, the solution of a linear system of n n n equations) requires a real-world time that is roughly O ( n 3 ) O(n^3) O ( n 3 ) . This growth rate is a great deal more tolerable than, say, O ( 2 n ) O(2^n) O ( 2 n ) , but it does mean that for (at this writing) n n n greater than 10,000 or so, something other than general LU factorization will have to be used.
2.5.4 Exercises ¶ ✍ The following are asymptotic assertions about the limit n → ∞ n\rightarrow\infty n → ∞ . In each case, prove the statement true or false.
(a) n 2 = O ( log n ) , n^2 = O(\log n),\quad n 2 = O ( log n ) ,
(b) n a = O ( n b ) n^{a} = O(n^b) n a = O ( n b ) if a ≤ b , a\le b,\quad a ≤ b ,
(c) e n ∼ e 2 n , e^n \sim e^{2n},\quad e n ∼ e 2 n ,
(d) n + n ∼ n + 2 n n+\sqrt{n}\sim n+2\sqrt{n} n + n ∼ n + 2 n .
✍ The following are asymptotic assertions about the limit h → 0 h\to 0 h → 0 . In each case, prove the statement true or false.
(a) h 2 log ( h ) = O ( h 3 ) , h^2\log(h) = O(h^3),\quad h 2 log ( h ) = O ( h 3 ) ,
(b) h a = O ( h b ) h^{a} = O(h^b) h a = O ( h b ) if a < b , a < b,\quad a < b ,
(c) sin ( h ) ∼ h , \sin(h) \sim h,\quad sin ( h ) ∼ h ,
(d) ( e 2 h − 1 ) ∼ h (e^{2h}-1)\sim h ( e 2 h − 1 ) ∼ h .
✍ Show that the inner product of two n n n -vectors takes exactly 2 n − 1 2n-1 2 n − 1 flops.
✍ Show that the multiplication of two n × n n\times n n × n matrices takes ∼ 2 n 3 \sim 2n^3 ∼ 2 n 3 flops.
✍ This problem is about evaluation of a polynomial c 1 + c 2 x + ⋯ + c n x n − 1 c_1 + c_2 x + \cdots + c_{n}x^{n-1} c 1 + c 2 x + ⋯ + c n x n − 1 .
(a) Here is a little code to do the evaluation.
y = c[1]
xpow = 1
for i in 2:n
xpow *= x
y += c[i]*xpow
end
Assuming that x
is a scalar, how many flops does this function take, as a function of n n n ?
(b) Compare the count from (a) to the flop count for Horner’s algorithm, Function 1.3.1 .
The exact sums for p = 1 , 2 p=1,2 p = 1 , 2 in (2.5.5) are as follows:
∑ k = 1 n k = n ( n + 1 ) 2 , ∑ k = 1 n k 2 = n ( n + 1 ) ( 2 n + 1 ) 6 . \sum_{k=1}^{n} k = \frac{n(n+1)}{2}, \qquad
\sum_{k=1}^{n} k^2 = \frac{n(n+1)(2n+1)}{6}. k = 1 ∑ n k = 2 n ( n + 1 ) , k = 1 ∑ n k 2 = 6 n ( n + 1 ) ( 2 n + 1 ) . (a) ✍ Use these to find the exact result for (2.5.9) .
(b) ⌨ Plot the ratio of your result from (a) and the asymptotic result 2 n 3 / 3 2n^3/3 2 n 3 /3 for all n = 1 0 1 + 0.03 i n=10^{1+0.03i} n = 1 0 1 + 0.03 i , i = 0 , … , 100 i=0,\dots,100 i = 0 , … , 100 , using a log scale for n n n and a linear scale for the ratio. (The curve should approach 1 asymptotically.)
✍ Show that for any nonnegative constant integer m m m ,
∑ k = 0 n − m k p ∼ n p + 1 p + 1 . \sum_{k=0}^{n-m} k^p \sim \frac{n^{p+1}}{p+1}. k = 0 ∑ n − m k p ∼ p + 1 n p + 1 . ⌨ The UpperTriangular
and LowerTriangular
matrix types cause specialized algorithms to be invoked by the backslash. Define
A = rand(1000,1000)
B = tril(A)
C = LowerTriangular(B)
b = rand(1000)
Using @elapsed
with the backslash solver, time how long it takes to solve the linear system A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax = b 100 times; then, do the same for matrices B \mathbf{B} B and C \mathbf{C} C . Is the timing for B \mathbf{B} B closer to A \mathbf{A} A or to C \mathbf{C} C ? (Hint: Remember to make one timing run without recording results, so that compilation time is not counted.)