To this point we have dealt frequently with the solution of the linear system A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax = b . Alongside this problem in its importance to linear algebra is the eigenvalue problem.
Definition 7.2.1 (Eigenvalue and eigenvector)
Given a square matrix A \mathbf{A} A , if
A x = λ x \mathbf{A}\mathbf{x} = \lambda \mathbf{x} Ax = λ x for a scalar λ and a nonzero vector x \mathbf{x} x , then λ is an eigenvalue and x \mathbf{x} x is an associated eigenvector .
7.2.1 Complex matrices ¶ A matrix with real entries can have complex eigenvalues. Therefore we assume all matrices, vectors, and scalars may be complex in what follows. Recall that a complex number can be represented as a + i b a+i b a + ib for real a a a and b b b and where i 2 = − 1 i^2=-1 i 2 = − 1 . The complex conjugate of x = a + i b x=a+i b x = a + ib is denoted x ˉ \bar{x} x ˉ and is given by x ˉ = a − i b \bar{x}=a-i b x ˉ = a − ib . The magnitude or modulus of a complex number z z z is
∣ z ∣ = z ⋅ z ˉ . |z| = \sqrt{z\cdot \bar{z}}. ∣ z ∣ = z ⋅ z ˉ . Definition 7.2.2 (Terms for complex matrices)
The adjoint or hermitian of a matrix A \mathbf{A} A is denoted A ∗ \mathbf{A}^* A ∗ and is given by A ∗ = ( A ‾ ) T = A T ‾ \mathbf{A}^*=(\overline{\mathbf{A}})^T=\overline{\mathbf{A}^T} A ∗ = ( A ) T = A T . The matrix is self-adjoint or hermitian if A ∗ = A \mathbf{A}^*=\mathbf{A} A ∗ = A .
The 2-norm of a complex vector u \mathbf{u} u is u ∗ u \sqrt{\mathbf{u}^*\mathbf{u}} u ∗ u . Other vector norms, and all matrix norms, are as defined in Vector and matrix norms .
Complex vectors u \mathbf{u} u and v \mathbf{v} v of the same dimension are orthogonal if u ∗ v = 0 \mathbf{u}^*\mathbf{v}=0 u ∗ v = 0 . Orthonormal vectors are mutually orthogonal and have unit 2-norm. A unitary matrix is a square matrix with orthonormal columns, or, equivalently, a matrix satisfying A ∗ = A − 1 \mathbf{A}^* = \mathbf{A}^{-1} A ∗ = A − 1 .
For the most part, “adjoint” replaces “transpose,” “hermitian” replaces “symmetric,” and “unitary matrix” replaces “orthogonal matrix” when applying our previous results to complex matrices.
7.2.2 Eigenvalue decomposition ¶ An easy rewrite of the eigenvalue definition (7.2.1) is that ( A − λ I ) x = 0 (\mathbf{A} - \lambda\mathbf{I}) \mathbf{x} = \boldsymbol{0} ( A − λ I ) x = 0 . Hence ( A − λ I ) (\mathbf{A} - \lambda\mathbf{I}) ( A − λ I ) is singular, and it therefore must have a zero determinant. This is the property most often used to compute eigenvalues by hand.
The determinant det ( A − λ I ) \det(\mathbf{A} - \lambda \mathbf{I}) det ( A − λ I ) is called the characteristic polynomial . Its roots are the eigenvalues, so we know that an n × n n\times n n × n matrix has n n n eigenvalues, counting algebraic multiplicity.
Suppose that A v k = λ k v k \mathbf{A}\mathbf{v}_k=\lambda_k\mathbf{v}_k A v k = λ k v k for k = 1 , … , n k=1,\ldots,n k = 1 , … , n . We can summarize these as
[ A v 1 A v 2 ⋯ A v n ] = [ λ 1 v 1 λ 2 v 2 ⋯ λ n v n ] , A [ v 1 v 2 ⋯ v n ] = [ v 1 v 2 ⋯ v n ] [ λ 1 λ 2 ⋱ λ n ] , \begin{split}
\begin{bmatrix}
\mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \cdots & \mathbf{A}\mathbf{v}_n
\end{bmatrix}
&=
\begin{bmatrix}
\lambda_1 \mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \cdots & \lambda_n \mathbf{v}_n
\end{bmatrix}, \\[1mm]
\mathbf{A} \begin{bmatrix}
\mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n
\end{bmatrix}
&=
\begin{bmatrix}
\mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n
\end{bmatrix}
\begin{bmatrix}
\lambda_1 & & & \\
& \lambda_2 & & \\
& & \ddots & \\
& & & \lambda_n
\end{bmatrix},
\end{split} [ A v 1 A v 2 ⋯ A v n ] A [ v 1 v 2 ⋯ v n ] = [ λ 1 v 1 λ 2 v 2 ⋯ λ n v n ] , = [ v 1 v 2 ⋯ v n ] ⎣ ⎡ λ 1 λ 2 ⋱ λ n ⎦ ⎤ , which we write as
A V = V D . \mathbf{A} \mathbf{V} = \mathbf{V} \mathbf{D}. AV = VD . If we find that V \mathbf{V} V is a nonsingular matrix, then we arrive at a key factorization.[1]
Definition 7.2.3 (Eigenvalue decomposition (
EVD ))
An eigenvalue decomposition (EVD ) of a square matrix A \mathbf{A} A is
A = V D V − 1 . \mathbf{A} = \mathbf{V} \mathbf{D} \mathbf{V}^{-1}. A = VD V − 1 . If A \mathbf{A} A has an EVD , we say that A \mathbf{A} A is diagonalizable ; otherwise A \mathbf{A} A is nondiagonalizable (or defective ).
Observe that if A v = λ v \mathbf{A}\mathbf{v} = \lambda \mathbf{v} Av = λ v for nonzero v \mathbf{v} v , then the equation remains true for any nonzero multiple of v \mathbf{v} v . Therefore, eigenvectors are not unique, and thus neither is an EVD .
We stress that while (7.2.6) is possible for all square matrices, (7.2.7) is not. One simple example of a nondiagonalizable matrix is
B = [ 1 1 0 1 ] . \mathbf{B} = \begin{bmatrix}
1 & 1\\0 & 1
\end{bmatrix}. B = [ 1 0 1 1 ] . There is a common circumstance in which we can guarantee an EVD exists. The proof of the following theorem can be found in many elementary texts on linear algebra.
Example 7.2.2 (Eigenvalues and eigenvectors Eigenvalues and eigenvectors)
Example 7.2.2 The eigvals
function returns a vector of the eigenvalues of a matrix.
2×2 Matrix{Float64}:
3.14159 3.14159
3.14159 3.14159
2-element Vector{Float64}:
0.0
6.283185307179586
If you want the eigenvectors as well, use eigen
.
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
2-element Vector{Float64}:
0.0
6.283185307179586
vectors:
2×2 Matrix{Float64}:
-0.707107 0.707107
0.707107 0.707107
norm(A * V[:, 2] - λ[2] * V[:, 2])
Both functions allow you to sort the eigenvalues by specified criteria.
A = diagm(-2.3:1.7)
@show eigvals(A, sortby=real);
@show eigvals(A, sortby=abs);
eigvals(A, sortby = real) = [-2.3, -1.3, -0.3, 0.7, 1.7]
eigvals(A, sortby = abs) =
[-0.3, 0.7, -1.3, 1.7, -2.3]
If the matrix is not diagonalizable, no message is given, but V
will be singular. The robust way to detect that circumstance is via κ ( V ) \kappa(\mathbf{V}) κ ( V ) .
A = [-1 1; 0 -1]
λ, V = eigen(A)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
2-element Vector{Float64}:
-1.0
-1.0
vectors:
2×2 Matrix{Float64}:
1.0 -1.0
0.0 2.22045e-16
Even in the nondiagonalizable case, A V = V D \mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{D} AV = VD holds.
opnorm(A * V - V * diagm(λ))
Example 7.2.2 The eig
function with one output argument returns a vector of the eigenvalues of a matrix.
A = pi * ones(2, 2);
lambda = eig(A)
With two output arguments given, eig
returns a matrix eigenvectors and a diagonal matrix with the eigenvalues.
We can check the fact that this is an EVD .
norm( A - V*D/V ) % / V is like * inv(V)
If the matrix is not diagonalizable, no message is given, but V
will be singular. The robust way to detect that circumstance is via κ ( V ) \kappa(\mathbf{V}) κ ( V ) .
A = [-1 1; 0 -1];
[V, D] = eig(A)
Even in the nondiagonalizable case, A V = V D \mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{D} AV = VD holds.
Example 7.2.2 The eig
function from scipy.linalg
will return a vector of eigenvalues and a matrix of associated eigenvectors.
from numpy.linalg import eig
A = pi * ones([2, 2])
d, V = eig(A)
print("eigenvalues:", d)
eigenvalues: [ 6.28318531e+00 -6.97573700e-16]
We can check the fact that this is an EVD (although in practice we never invert a matrix).
from numpy.linalg import inv
D = diag(d)
print(f"should be near zero: {norm(A - V @ D @ inv(V), 2):.2e}")
should be near zero: 1.26e-15
If the matrix is not diagonalizable, no message is given, but V
will be singular. The robust way to detect that circumstance is via κ ( V ) \kappa(\mathbf{V}) κ ( V ) .
from numpy.linalg import cond
A = array([[1, 1], [0, 1]])
d, V = eig(A)
print(f"cond(V) is {cond(V):.2e}")
But even in the nondiagonalizable case, A V = V D \mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{D} AV = VD holds up to roundoff error.
print(f"should be near zero: {norm(A @ V - V @ diag(d), 2):.2e}")
should be near zero: 2.22e-16
7.2.3 Similarity and matrix powers ¶ The particular relationship between matrices A \mathbf{A} A and D \mathbf{D} D in (7.2.7) is important.
Definition 7.2.4 (Similar matrices)
If S \mathbf{S} S is any nonsingular matrix, we say that B = S A S − 1 \mathbf{B}=\mathbf{S}\mathbf{A}\mathbf{S}^{-1} B = SA S − 1 is a similarity transformation of A \mathbf{A} A , and we say that B \mathbf{B} B is similar to A \mathbf{A} A .
A similarity transformation does not change eigenvalues, a fact that is typically proved in elementary linear algebra texts.
The EVD is especially useful for matrix powers. To begin,
A 2 = ( V D V − 1 ) ( V D V − 1 ) = V D ( V − 1 V ) D V − 1 = V D 2 V − 1 . \mathbf{A}^2=(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})=\mathbf{V}\mathbf{D}(\mathbf{V}^{-1}\mathbf{V})\mathbf{D}\mathbf{V}^{-1}=\mathbf{V}\mathbf{D}^2\mathbf{V}^{-1}. A 2 = ( VD V − 1 ) ( VD V − 1 ) = VD ( V − 1 V ) D V − 1 = V D 2 V − 1 . Multiplying this result by A \mathbf{A} A repeatedly, we find that
A k = V D k V − 1 . \mathbf{A}^k = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}. A k = V D k V − 1 . Because D \mathbf{D} D is diagonal, its power D k \mathbf{D}^k D k is just the diagonal matrix of the k k k th powers of the eigenvalues.
Furthermore, given a polynomial p ( z ) = c 0 + c 1 z + ⋯ + c m z m p(z)=c_0+c_1 z + \cdots + c_m z^m p ( z ) = c 0 + c 1 z + ⋯ + c m z m , we can apply the polynomial to the matrix in a straightforward way,
p ( A ) = c 0 I + c 1 A + ⋯ + c m A m . p(\mathbf{A}) = c_0\mathbf{I} +c_1 \mathbf{A} + \cdots + c_m \mathbf{A}^m. p ( A ) = c 0 I + c 1 A + ⋯ + c m A m . Applying (7.2.10) leads to
p ( A ) = c 0 V V − 1 + c 1 V D V − 1 + ⋯ + c m V D m V − 1 = V ⋅ [ c 0 I + c 1 D + ⋯ + c m D m ] ⋅ V − 1 = V ⋅ [ p ( λ 1 ) p ( λ 2 ) ⋱ p ( λ n ) ] ⋅ V − 1 . \begin{split}
p(\mathbf{A}) & = c_0\mathbf{V}\mathbf{V}^{-1} +c_1 \mathbf{V}\mathbf{D}\mathbf{V}^{-1} + \cdots + c_m \mathbf{V}\mathbf{D}^m\mathbf{V}^{-1} \\
&= \mathbf{V} \cdot [ c_0\mathbf{I} +c_1 \mathbf{D} + \cdots + c_m \mathbf{D}^m] \cdot \mathbf{V}^{-1} \\[1mm]
&= \mathbf{V} \cdot \begin{bmatrix}
p(\lambda_1) & & & \\ & p(\lambda_2) & & \\ & & \ddots & \\ & & & p(\lambda_n)
\end{bmatrix} \cdot \mathbf{V}^{-1}.
\end{split} p ( A ) = c 0 V V − 1 + c 1 VD V − 1 + ⋯ + c m V D m V − 1 = V ⋅ [ c 0 I + c 1 D + ⋯ + c m D m ] ⋅ V − 1 = V ⋅ ⎣ ⎡ p ( λ 1 ) p ( λ 2 ) ⋱ p ( λ n ) ⎦ ⎤ ⋅ V − 1 . Finally, given the convergence of Taylor polynomials to common functions, we are able to apply a function f f f to a square matrix by replacing p p p with f f f in (7.2.11) .
7.2.4 Conditioning of eigenvalues ¶ Just as linear systems have condition numbers that quantify the effect of finite precision, eigenvalue problems may be poorly conditioned too. While many possible results can be derived, we will use just one, the Bauer–Fike theorem .
Theorem 7.2.3 (Bauer–Fike)
Let A ∈ C n × n \mathbf{A}\in\mathbb{C}^{n\times n} A ∈ C n × n be diagonalizable, A = V D V − 1 \mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^{-1} A = VD V − 1 , with eigenvalues λ 1 , … , λ n \lambda_1,\ldots,\lambda_n λ 1 , … , λ n . If μ is an eigenvalue of A + E \mathbf{A}+\mathbf{E} A + E for a complex matrix E \mathbf{E} E , then
min j = 1 , … , n ∣ μ − λ j ∣ ≤ κ ( V ) ∥ E ∥ , \min_{j=1,\ldots,n} |\mu - \lambda_j| \le \kappa(\mathbf{V}) \, \| \mathbf{E} \|\,, j = 1 , … , n min ∣ μ − λ j ∣ ≤ κ ( V ) ∥ E ∥ , where ∥ ⋅ ∥ \|\cdot\| ∥ ⋅ ∥ and κ are in the 2-norm.
The Bauer–Fike theorem tells us that eigenvalues can be perturbed by an amount that is κ ( V ) \kappa(\mathbf{V}) κ ( V ) times larger than perturbations to the matrix. This result is a bit less straightforward than it might seem—eigenvectors are not unique, so there are multiple possible values for κ ( V ) \kappa(\mathbf{V}) κ ( V ) . Even so, the theorem indicates caution when a matrix has eigenvectors that form an ill-conditioned matrix. The limiting case of κ ( V ) = ∞ \kappa(\mathbf{V})=\infty κ ( V ) = ∞ might be interpreted as indicating a nondiagonalizable matrix A \mathbf{A} A . The other extreme is also of interest: κ ( V ) = 1 \kappa(\mathbf{V})=1 κ ( V ) = 1 , which implies that V \mathbf{V} V is unitary.
As we will see in Symmetry and definiteness , hermitian and real symmetric matrices are normal. Since the condition number of a unitary matrix is equal to 1, (7.2.13) guarantees that a perturbation of a normal matrix changes the eigenvalues by the same amount or less.
Example 7.2.3 (Eigenvalue conditioning)
Example 7.2.3 We first define a hermitian matrix. Note that the '
operation is the adjoint and includes complex conjugation.
n = 7
A = randn(n, n) + 1im * randn(n, n)
A = (A + A') / 2
7×7 Matrix{ComplexF64}:
0.325505+0.0im 0.490234-0.743158im … 0.396406+0.590502im
0.490234+0.743158im 1.85613+0.0im -0.547247+1.12398im
0.515176-1.17467im 0.404081+0.0817125im -0.457196-0.442748im
0.1643-1.43759im -0.438945+0.207581im -0.988933-0.24625im
0.396863+0.555028im 0.363285-0.815487im -0.572039+0.129132im
-1.23095+1.61663im 0.620866-0.0761188im … -0.469493-0.741019im
0.396406-0.590502im -0.547247-1.12398im -1.19368+0.0im
We confirm that the matrix A \mathbf{A} A is normal by checking that κ ( V ) = 1 \kappa(\mathbf{V}) = 1 κ ( V ) = 1 (to within roundoff).
λ, V = eigen(A)
@show cond(V);
cond(V) = 1.0000000000000007
Now we perturb A \mathbf{A} A and measure the effect on the eigenvalues. The Bauer–Fike theorem uses absolute differences, not relative ones.
Tip
Since the ordering of eigenvalues can change, we look at all pairwise differences and take the minima.
ΔA = 1e-8 * normalize(randn(n, n) + 1im * randn(n, n))
λ̃ = eigvals(A + ΔA)
dist = minimum([abs(x - y) for x in λ̃, y in λ], dims=2)
7×1 Matrix{Float64}:
1.1607368132587166e-9
8.506184094346277e-10
1.901690883719605e-9
1.2162453210897967e-9
1.3795003762031114e-9
1.1221499944880294e-9
1.2396386976731458e-9
As promised, the perturbations in the eigenvalues do not exceed the normwise perturbation to the original matrix.
Now we see what happens for a triangular matrix.
n = 20
x = 1:n
A = triu(x * ones(n)')
A[1:5, 1:5]
5×5 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0
0.0 2.0 2.0 2.0 2.0
0.0 0.0 3.0 3.0 3.0
0.0 0.0 0.0 4.0 4.0
0.0 0.0 0.0 0.0 5.0
This matrix is not especially close to normal.
λ, V = eigen(A)
@show cond(V);
cond(V) = 6.149906664929389e9
As a result, the eigenvalues can change by a good deal more.
ΔA = 1e-8 * normalize(randn(n, n) + 1im * randn(n, n))
λ̃ = eigvals(A + ΔA)
dist = minimum([abs(x - y) for x in λ̃, y in λ], dims=2)
BF_bound = cond(V) * norm(ΔA)
@show maximum(dist), BF_bound;
(maximum(dist), BF_bound) = (0.494851347444697, 61.499066649293894)
If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.
using Plots
plt = scatter(λ, zeros(n), aspect_ratio=1)
for _ in 1:200
ΔA = eps(Float32) * normalize(randn(n, n) + 1im * randn(n, n))
λ̃ = eigvals(A + ΔA)
scatter!(real(λ̃), imag(λ̃), m=1, color=:black)
end
plt
The plot shows that some eigenvalues are much more affected than others. This situation is not unusual, but it is not explained by the Bauer–Fike theorem.
Example 7.2.3 We first define a hermitian matrix. Note that the '
operation is the adjoint and includes complex conjugation.
n = 7;
A = randn(n, n) + 1i * randn(n, n);
A = (A + A') / 2;
We confirm that the matrix A \mathbf{A} A is normal by checking that κ ( V ) = 1 \kappa(\mathbf{V}) = 1 κ ( V ) = 1 (to within roundoff).
[V, D] = eig(A);
lambda = diag(D);
cond(V)
Now we perturb A \mathbf{A} A and measure the effect on the eigenvalues. The Bauer–Fike theorem uses absolute differences, not relative ones. Note: since the ordering of eigenvalues can change, we look at all pairwise differences and take the minima.
E = randn(n, n) + 1i * randn(n, n);
E = 1e-8 * E / norm(E);
dd = eig(A + E);
dist = [];
for j = 1:n
dist = [dist; min(abs(dd - lambda(j)))];
end
dist
As promised, the perturbations in the eigenvalues do not exceed the normwise perturbation to the original matrix.
Now we see what happens for a triangular matrix.
n = 20;
x = (1:n)';
A = triu(x * ones(1, n));
A(1:5, 1:5)
This matrix is not at all close to normal.
[V, D] = eig(A);
lambda = diag(D);
cond(V)
As a result, the eigenvalues can change by a good deal more.
E = randn(n, n) + 1i * randn(n, n);
E = 1e-8 * E / norm(E);
dd = eig(A + E);
dist = -Inf;
for j = 1:n
dist = max(dist, min(abs(dd - lambda(j))));
end
fprintf("max change in eigenvalues: %.2e", dist)
fprintf("Bauer-Fike upper bound: %.2e", cond(V) * norm(E))
max change in eigenvalues:
If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.
clf
plot(lambda, 0*lambda, 'o')
axis equal; hold on
for k = 1:60
E = randn(n, n) + 1i * randn(n, n);
E = eps(single(1)) * E / norm(E);
dd = eig(A + E);
plot(real(dd), imag(dd), 'k.', markersize=2)
end
The plot shows that some eigenvalues are much more affected than others. This situation is not unusual, but it is not explained by the Bauer–Fike theorem.
Example 7.2.3 We first define a hermitian matrix. Note that we add the conjugate transpose of a matrix to itself.
n = 7
A = random.randn(n, n) + 1j * random.randn(n, n)
A = (A + conj(A.T)) / 2
We confirm that the matrix A \mathbf{A} A is normal by checking that κ ( V ) = 1 \kappa(\mathbf{V}) = 1 κ ( V ) = 1 (to within roundoff).
from numpy.linalg import eig
d, V = eig(A)
print(f"eigenvector matrix has condition number {cond(V):.5f}")
eigenvector matrix has condition number 1.00000
Now we perturb A \mathbf{A} A and measure the effect on the eigenvalues. Note that the Bauer–Fike theorem uses absolute differences, not relative ones. Since the ordering of eigenvalues can change, we look at all pairwise differences and take the minima.
E = random.randn(n, n) + 1j * random.randn(n, n)
E = 1e-8 * E / norm(E, 2)
dd, _ = eig(A + E)
dist = array([min([abs(x - y) for x in dd]) for y in d])
print(dist)
[9.13062970e-10 3.34658175e-09 7.06122371e-10 4.31937276e-10
1.05432865e-09 3.97864390e-09 2.06839348e-09]
As promised, the perturbations in the eigenvalues do not exceed the normwise perturbation to the original matrix.
Now we see what happens for a triangular matrix.
n = 20
x = arange(n) + 1
A = triu(outer(x, ones(n)))
print(A[:5, :5])
[[1. 1. 1. 1. 1.]
[0. 2. 2. 2. 2.]
[0. 0. 3. 3. 3.]
[0. 0. 0. 4. 4.]
[0. 0. 0. 0. 5.]]
This matrix is not at all close to normal.
d, V = eig(A)
print(f"eigenvector matrix has condition number {cond(V):.2e}")
eigenvector matrix has condition number 6.15e+09
As a result, the eigenvalues can change by a good deal more.
E = random.randn(n, n) + 1j * random.randn(n, n)
E = 1e-8 * E / norm(E, 2)
dd, _ = eig(A + E)
dist = array([min([abs(x - y) for x in dd]) for y in d])
print(f"Maximum eigenvalue change is {max(dist):.2e}")
print(f"The Bauer-Fike upper bound is {cond(V) * norm(E, 2):.2e}")
Maximum eigenvalue change is 8.18e-01
The Bauer-Fike upper bound is 6.15e+01
If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.
clf
scatter(d, zeros(n), 18)
axis("equal")
for _ in range(100):
E = random.randn(n, n) + 1j * random.randn(n, n)
E = finfo(np.float32).eps * E / norm(E, 2)
dd, _ = eig(A + E)
scatter(real(dd), imag(dd), 2, 'k')
The plot shows that some eigenvalues are much more affected than others. This situation is not unusual, but it is not explained by the Bauer–Fike theorem.
7.2.5 Computing the EVD ¶ Roots of the characteristic polynomial are not used in numerical methods for finding eigenvalues.[2] Practical algorithms for computing the EVD go beyond the scope of this book. The essence of the matter is the connection to matrix powers indicated in (7.2.10) . (We will see much more about the importance of matrix powers in Chapter 8.)
If the eigenvalues have different complex magnitudes, then as k → ∞ k\to\infty k → ∞ the entries on the diagonal of D k \mathbf{D}^k D k become increasingly well separated and easy to pick out. It turns out that there is an astonishingly easy and elegant way to accomplish this separation without explicitly computing the matrix powers.
Example 7.2.4 (Francis QR iteration)
Example 7.2.4 Let’s start with a known set of eigenvalues and an orthogonal eigenvector basis.
D = diagm([-6, -1, 2, 4, 5])
V, R = qr(randn(5, 5)) # V is unitary
A = V * D * V'
5×5 Matrix{Float64}:
3.23016 0.376963 -1.19976 -0.659159 0.169185
0.376963 0.688255 -0.843471 -2.79237 -3.01583
-1.19976 -0.843471 -0.512361 0.120006 -1.97367
-0.659159 -2.79237 0.120006 -1.38242 -2.99132
0.169185 -3.01583 -1.97367 -2.99132 1.97636
5-element Vector{Float64}:
-6.000000000000002
-0.9999999999999978
1.9999999999999993
3.9999999999999982
4.999999999999995
Now we will take the QR factorization and just reverse the factors.
It turns out that this is a similarity transformation, so the eigenvalues are unchanged.
5-element Vector{Float64}:
-5.999999999999998
-1.0000000000000002
2.0000000000000004
3.999999999999998
5.000000000000002
What’s remarkable, and not elementary, is that if we repeat this transformation many times, the resulting matrix converges to D \mathbf{D} D .
for k in 1:40
Q, R = qr(A)
A = R * Q
end
A
5×5 Matrix{Float64}:
-5.99982 -0.0445231 4.5239e-7 2.09391e-15 1.01254e-15
-0.0445231 4.99982 0.000186824 3.73623e-15 3.97368e-16
4.5239e-7 0.000186824 4.0 4.98983e-12 -2.41858e-16
7.69591e-20 7.14865e-16 4.98951e-12 2.0 2.6241e-12
-4.03397e-31 2.47249e-28 -3.87812e-24 2.62456e-12 -1.0
Example 7.2.4 Let’s start with a known set of eigenvalues and an orthogonal eigenvector basis.
D = diag([-6, -1, 2, 4, 5]);
[V, R]= qr(randn(5, 5)); % V is unitary
A = V * D * V';
Now we will take the QR factorization and just reverse the factors.
[Q, R] = qr(A);
A = R * Q;
It turns out that this is a similarity transformation, so the eigenvalues are unchanged.
What’s remarkable, and not elementary, is that if we repeat this transformation many times, the resulting matrix converges to D \mathbf{D} D .
for k = 1:40
[Q, R] = qr(A);
A = R * Q;
end
format short e
A
Example 7.2.4 Let’s start with a known set of eigenvalues and an orthogonal eigenvector basis.
from numpy.linalg import qr
D = diag([-6, -1, 2, 4, 5])
V, R = qr(random.randn(5, 5))
A = V @ D @ V.T # note that V.T = inv(V) here
Now we will take the QR factorization and just reverse the factors.
It turns out that this is a similarity transformation, so the eigenvalues are unchanged.
What’s remarkable, and not elementary, is that if we repeat this transformation many times, the resulting matrix converges to D \mathbf{D} D .
for k in range(40):
Q, R = qr(A)
A = R @ Q
set_printoptions(precision=4)
print(A)
[[-6.0000e+00 -6.3163e-05 -7.7494e-07 8.0433e-16 -6.6814e-16]
[-6.3163e-05 5.0000e+00 3.4480e-03 1.3223e-15 -1.2786e-16]
[-7.7494e-07 3.4480e-03 4.0000e+00 -2.8128e-13 2.4507e-16]
[ 7.1896e-20 6.4328e-16 -2.8163e-13 2.0000e+00 3.8647e-13]
[-3.5579e-32 -9.6403e-29 1.2418e-26 3.8653e-13 -1.0000e+00]]
The process demonstrated in Demo 7.2.4 is known as the Francis QR iteration , and it can be formulated as an O ( n 3 ) O(n^3) O ( n 3 ) algorithm for finding the EVD . It forms the basis of most practical eigenvalue computations, at least until the matrix size approaches 104 or so.
7.2.6 Exercises ¶ (a) ✍ Suppose that matrix A \mathbf{A} A has an eigenvalue λ. Show that for any induced matrix norm, ∥ A ∥ ≥ ∣ λ ∣ \| \mathbf{A} \|\ge |\lambda| ∥ A ∥ ≥ ∣ λ ∣ .
(b) ✍ Find a matrix A \mathbf{A} A such that ∥ A ∥ 2 \| \mathbf{A} \|_2 ∥ A ∥ 2 is strictly larger than ∣ λ ∣ |\lambda| ∣ λ ∣ for all eigenvalues λ. (Proof-by-computer isn’t allowed here. You don’t need to compute ∥ A ∥ 2 \| \mathbf{A} \|_2 ∥ A ∥ 2 exactly, just a lower bound for it.)
✍ Prove that the matrix B \mathbf{B} B in (7.2.8) does not have two independent eigenvectors.
⌨ Find the eigenvalues of each matrix. Then, for each eigenvalue λ, use rank
to verify that λ I \lambda\mathbf{I} λ I minus the given matrix is singular.
A = [ 2 − 1 0 − 1 2 − 1 0 − 1 2 ] \mathbf{A} = \begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}\qquad A = ⎣ ⎡ 2 − 1 0 − 1 2 − 1 0 − 1 2 ⎦ ⎤
B = [ 2 − 1 − 1 − 2 2 − 1 − 1 − 2 2 ] \mathbf{B} = \begin{bmatrix}
2 & -1 & -1 \\
-2 & 2 & -1 \\
-1 & -2 & 2
\end{bmatrix} \qquad B = ⎣ ⎡ 2 − 2 − 1 − 1 2 − 2 − 1 − 1 2 ⎦ ⎤
C = [ 2 − 1 − 1 − 1 2 − 1 − 1 − 1 2 ] \mathbf{C} = \begin{bmatrix}
2 & -1 & -1 \\
-1 & 2 & -1 \\
-1 & -1 & 2
\end{bmatrix} C = ⎣ ⎡ 2 − 1 − 1 − 1 2 − 1 − 1 − 1 2 ⎦ ⎤
D = [ 3 1 0 0 1 3 1 0 0 1 3 1 0 0 1 3 ] \mathbf{D} = \begin{bmatrix}
3 & 1 & 0 & 0 \\
1 & 3 & 1 & 0 \\
0 & 1 & 3 & 1 \\
0 & 0 & 1 & 3
\end{bmatrix}\qquad D = ⎣ ⎡ 3 1 0 0 1 3 1 0 0 1 3 1 0 0 1 3 ⎦ ⎤
E = [ 4 − 3 − 2 − 1 − 2 4 − 2 − 1 − 1 − 2 4 − 1 − 1 − 2 − 1 4 ] \mathbf{E} = \begin{bmatrix}
4 & -3 & -2 & -1\\
-2 & 4 & -2 & -1 \\
-1 & -2 & 4 & -1 \\
-1 & -2 & -1 & 4 \\
\end{bmatrix} E = ⎣ ⎡ 4 − 2 − 1 − 1 − 3 4 − 2 − 2 − 2 − 2 4 − 1 − 1 − 1 − 1 4 ⎦ ⎤
(a) ✍ Show that the eigenvalues of a diagonal n × n n\times n n × n matrix D \mathbf{D} D are the diagonal entries of D \mathbf{D} D . (That is, produce the associated eigenvectors.)
(b) ✍ The eigenvalues of a triangular matrix are its diagonal entries. Prove this in the 3 × 3 3\times 3 3 × 3 case,
T = [ t 11 t 12 t 13 0 t 22 t 23 0 0 t 33 ] , \mathbf{T} =
\begin{bmatrix}
t_{11} & t_{12}& t_{13}\\ 0 & t_{22} & t_{23} \\ 0 & 0 & t_{33}
\end{bmatrix}, T = ⎣ ⎡ t 11 0 0 t 12 t 22 0 t 13 t 23 t 33 ⎦ ⎤ , by finding the eigenvectors. (Start by showing that [ 1 , 0 , 0 ] T [1,0,0]^T [ 1 , 0 , 0 ] T is an eigenvector. Then show how to make [ a , 1 , 0 ] T [a,1,0]^T [ a , 1 , 0 ] T an eigenvector, except for one case that does not change the outcome. Continue the same logic for [ a , b , 1 ] T [a,b,1]^T [ a , b , 1 ] T .)
✍ Let A = π 6 [ 4 1 4 4 ] \mathbf{A}=\displaystyle\frac{\pi}{6}\begin{bmatrix} 4 & 1 \\ 4 & 4 \end{bmatrix} A = 6 π [ 4 4 1 4 ] .
(a) Show that
λ 1 = π , v 1 = [ 1 2 ] , λ 2 = π 3 , v 2 = [ 1 − 2 ] \lambda_1=\pi,\, \mathbf{v}_1=\begin{bmatrix}1 \\ 2 \end{bmatrix}, \quad \lambda_2=\frac{\pi}{3},\, \mathbf{v}_2=\begin{bmatrix}1 \\ -2 \end{bmatrix} λ 1 = π , v 1 = [ 1 2 ] , λ 2 = 3 π , v 2 = [ 1 − 2 ] yield an EVD of A \mathbf{A} A .
(b) Use (7.2.12) to evaluate p ( A ) p(\mathbf{A}) p ( A ) , where p ( x ) = ( x − π ) 4 p(x) = (x-\pi)^4 p ( x ) = ( x − π ) 4 .
(c) Use the function analog of (7.2.12) to evaluate cos ( A ) \cos(\mathbf{A}) cos ( A ) .
⌨ In Exercise 2.3.5 , you showed that the
displacements of point masses placed along a string satisfy a linear system A q = f \mathbf{A}\mathbf{q}=\mathbf{f} Aq = f for an ( n − 1 ) × ( n − 1 ) (n-1)\times(n-1) ( n − 1 ) × ( n − 1 ) matrix A \mathbf{A} A . The eigenvalues and eigenvectors of A \mathbf{A} A correspond to resonant frequencies and modes of vibration of the string. For n = 40 n=40 n = 40 and the physical parameters given in part (b) of that exercise, find the eigenvalue decomposition of A \mathbf{A} A . Report the three eigenvalues with smallest absolute value, and plot all three associated eigenvectors on a single graph (as functions of the vector row index).
⌨ Demo 7.2.4 suggests that the result of the Francis QR iteration as k → ∞ k\to\infty k → ∞ sorts the eigenvalues on the diagonal according to a particular ordering. Following the code there as a model, create a random matrix with eigenvalues equal to − 9.6 , − 8.6 , … , 10.4 -9.6,-8.6,\ldots,10.4 − 9.6 , − 8.6 , … , 10.4 , perform the iteration 200 times, and check whether the sorting criterion holds in your experiment as well.
⌨ Eigenvalues of random matrices and their perturbations can be very interesting.
(a) Let A=randn(60,60)
.[3] Scatter plot its eigenvalues in the complex plane, using a plot aspect ratio of 1 and red diamonds as markers.
(b) Let E \mathbf{E} E be another random 60 × 60 60\times 60 60 × 60 matrix, and on top of the previous graph, plot the eigenvalues of A + 0.05 E \mathbf{A}+0.05\mathbf{E} A + 0.05 E as blue dots. Repeat this for 100 different values of E \mathbf{E} E .
(c) Let T=triu(A)
. On a new graph, scatter plot the eigenvalues of T \mathbf{T} T in the complex plane. (They all lie on the real axis.)
(d) Repeat part (b) with T \mathbf{T} T in place of A \mathbf{A} A .
(e) Compute some condition numbers and apply Theorem 7.2.3 to explain the dramatic difference between your plots with respect to the dot distributions.