We now introduce another factorization that is as fundamental as the EVD .
Definition 7.3.1 (Singular value decomposition (
SVD ))
The singular value decomposition of an m × n m\times n m × n matrix A \mathbf{A} A is
A = U S V ∗ , \mathbf{A} = \mathbf{U} \mathbf{S} \mathbf{V}^*, A = US V ∗ , where U ∈ C m × m \mathbf{U}\in\mathbb{C}^{m\times m} U ∈ C m × m and V ∈ C n × n \mathbf{V}\in\mathbb{C}^{n\times n} V ∈ C n × n are unitary and S ∈ R m × n \mathbf{S}\in\mathbb{R}^{m\times n} S ∈ R m × n is real and diagonal with nonnegative elements.
The columns of U \mathbf{U} U and V \mathbf{V} V are called left and right singular vectors , respectively. The diagonal elements of S \mathbf{S} S , written σ 1 , … , σ r \sigma_1,\ldots,\sigma_r σ 1 , … , σ r , for r = min { m , n } r=\min\{m,n\} r = min { m , n } , are called the singular values of A \mathbf{A} A and are ordered so that
σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r ≥ 0 , r = min { m , n } . \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r\ge 0, \qquad r=\min\{m,n\}. σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r ≥ 0 , r = min { m , n } . We call σ 1 \sigma_1 σ 1 the principal singular value and u 1 \mathbf{u}_{1} u 1 and v 1 \mathbf{v}_{1} v 1 the principal singular vectors .
Every m × n m\times n m × n matrix has an SVD . The singular values of a matrix are unique, but the singular vectors are not. If the matrix is real, then U \mathbf{U} U and V \mathbf{V} V in (7.3.1) can be chosen to be real, orthogonal matrices.
It is easy to check that
[ 3 4 ] = ( 1 5 [ 3 − 4 4 3 ] ) ⋅ [ 5 0 ] ⋅ [ 1 ] \begin{bmatrix} 3 \\ 4 \end{bmatrix}
= \left(\frac{1}{5} \begin{bmatrix}
3 & -4 \\ 4 & 3
\end{bmatrix}\,\right) \cdot \begin{bmatrix}
5 \\ 0
\end{bmatrix}\cdot \begin{bmatrix}
1
\end{bmatrix} [ 3 4 ] = ( 5 1 [ 3 4 − 4 3 ] ) ⋅ [ 5 0 ] ⋅ [ 1 ] meets all the requirements of an SVD . Interpreted as a matrix, the vector [ 3 , 4 ] [3,4] [ 3 , 4 ] has the lone singular value 5.
Suppose A \mathbf{A} A is a real matrix and that A = U S V T \mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^T A = US V T is an SVD . Then A T = V S T U T \mathbf{A}^T=\mathbf{V}\mathbf{S}^T\mathbf{U}^T A T = V S T U T meets all the requirements of an SVD for A T \mathbf{A}^T A T : the first and last matrices are orthogonal, and the middle matrix is diagonal with nonnegative elements. Hence A \mathbf{A} A and A T \mathbf{A}^T A T have the same singular values.
7.3.1 Connections to the EVD ¶ Let A = U S V ∗ \mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^* A = US V ∗ be m × n m\times n m × n , and compute the square hermitian matrix B = A ∗ A \mathbf{B}=\mathbf{A}^*\mathbf{A} B = A ∗ A :
B = ( V S ∗ U ∗ ) ( U S V ∗ ) = V S ∗ S V ∗ = V ( S T S ) V − 1 . \mathbf{B} = (\mathbf{V}\mathbf{S}^*\mathbf{U}^*) (\mathbf{U}\mathbf{S}\mathbf{V}^*) = \mathbf{V}\mathbf{S}^*\mathbf{S}\mathbf{V}^* = \mathbf{V}(\mathbf{S}^T\mathbf{S})\mathbf{V}^{-1}. B = ( V S ∗ U ∗ ) ( US V ∗ ) = V S ∗ S V ∗ = V ( S T S ) V − 1 . Note that S T S \mathbf{S}^T\mathbf{S} S T S is a diagonal n × n n \times n n × n matrix. There are two cases to consider. If m ≥ n m \ge n m ≥ n , then
S T S = [ σ 1 2 ⋱ σ n 2 ] . \mathbf{S}^T\mathbf{S} =
\begin{bmatrix}
\sigma_1^2 & & \\
& \ddots & \\
& & \sigma_n^2
\end{bmatrix}. S T S = ⎣ ⎡ σ 1 2 ⋱ σ n 2 ⎦ ⎤ . On the other hand, if m < n m<n m < n , then
S T S = [ σ 1 2 ⋱ σ m 2 0 ] . \mathbf{S}^T\mathbf{S} =
\begin{bmatrix}
\sigma_1^2 & & & \\
& \ddots & & \\
& & \sigma_m^2 & \\
& & & \boldsymbol{0}
\end{bmatrix}. S T S = ⎣ ⎡ σ 1 2 ⋱ σ m 2 0 ⎦ ⎤ . Except for some unimportant technicalities, the eigenvectors of A ∗ A \mathbf{A}^*\mathbf{A} A ∗ A , when appropriately ordered and normalized, are right singular vectors of A \mathbf{A} A . The left singular vectors could then be deduced from the identity A V = U S \mathbf{A}\mathbf{V} = \mathbf{U}\mathbf{S} AV = US .
Another close connection between EVD and SVD comes via the ( m + n ) × ( m + n ) (m+n)\times (m+n) ( m + n ) × ( m + n ) matrix
C = [ 0 A ∗ A 0 ] . \mathbf{C} =
\begin{bmatrix}
0 & \mathbf{A}^* \\ \mathbf{A} & 0
\end{bmatrix}. C = [ 0 A A ∗ 0 ] . If σ is a singular value of B \mathbf{B} B , then σ and − σ -\sigma − σ are eigenvalues of C \mathbf{C} C , and the associated eigenvector immediately reveals a left and a right singular vector (see Exercise 11 ). This connection is implicitly exploited by software to compute the SVD .
7.3.2 Interpreting the SVD ¶ Another way to write A = U S V ∗ \mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^* A = US V ∗ is
A V = U S . \mathbf{A}\mathbf{V}=\mathbf{U}\mathbf{S}. AV = US . Taken columnwise, this equation means
A v k = σ k u k , k = 1 , … , r = min { m , n } . \mathbf{A} \mathbf{v}_{k} = \sigma_k \mathbf{u}_{k}, \qquad k=1,\ldots,r=\min\{m,n\}. A v k = σ k u k , k = 1 , … , r = min { m , n } . In words, each right singular vector is mapped by A \mathbf{A} A to a scaled version of its corresponding left singular vector; the magnitude of scaling is its singular value.
Both the SVD and the EVD describe a matrix in terms of some special vectors and a small number of scalars. Table 7.3.1 summarizes the key differences. The SVD sacrifices having the same basis in both source and image spaces—after all, they may not even have the same dimension—but as a result gains orthogonality in both spaces.
Table 7.3.1: Comparison of the EVD and SVD
EVD SVD exists for most square matrices exists for all rectangular and square matrices A x k = λ k x k \mathbf{A}\mathbf{x}_k = \lambda_k \mathbf{x}_k A x k = λ k x k A v k = σ k u k \mathbf{A} \mathbf{v}_k = \sigma_k \mathbf{u}_k A v k = σ k u k same basis for domain and range of A \mathbf{A} A two orthonormal bases may have poor conditioning perfectly conditioned
In The QR factorization we saw that a matrix has both full and thin forms of the QR factorization. A similar situation holds with the SVD .
Suppose A \mathbf{A} A is m × n m\times n m × n with m > n m > n m > n and let A = U S V ∗ \mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^* A = US V ∗ be an SVD . The last m − n m-n m − n rows of S \mathbf{S} S are all zero due to the fact that S \mathbf{S} S is diagonal. Hence
U S = [ u 1 ⋯ u n u n + 1 ⋯ u m ] [ σ 1 ⋱ σ n 0 ] = [ u 1 ⋯ u n ] [ σ 1 ⋱ σ n ] = U ^ S ^ , \begin{align*}
\mathbf{U} \mathbf{S} & =
\begin{bmatrix}
\mathbf{u}_1 & \cdots & \mathbf{u}_n & \mathbf{u}_{n+1} & \cdots & \mathbf{u}_m
\end{bmatrix}
\begin{bmatrix}
\sigma_1 & & \\
& \ddots & \\
& & \sigma_n \\
& & \\
& \boldsymbol{0} & \\
& &
\end{bmatrix} \\
&=
\begin{bmatrix}
\mathbf{u}_1 & \cdots & \mathbf{u}_n
\end{bmatrix}
\begin{bmatrix}
\sigma_1 & & \\
& \ddots & \\
& & \sigma_n
\end{bmatrix} = \hat{\mathbf{U}} \hat{\mathbf{S}},
\end{align*} US = [ u 1 ⋯ u n u n + 1 ⋯ u m ] ⎣ ⎡ σ 1 ⋱ 0 σ n ⎦ ⎤ = [ u 1 ⋯ u n ] ⎣ ⎡ σ 1 ⋱ σ n ⎦ ⎤ = U ^ S ^ , in which U ^ \hat{\mathbf{U}} U ^ is m × n m\times n m × n and S ^ \hat{\mathbf{S}} S ^ is n × n n\times n n × n . This allows us to define the thin SVD
A = U ^ S ^ V ∗ , \mathbf{A}=\hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^*, A = U ^ S ^ V ∗ , in which S ^ \hat{\mathbf{S}} S ^ is square and diagonal and U ^ \hat{\mathbf{U}} U ^ is ONC but not square.
The thin form retains all the information about A \mathbf{A} A from the SVD ; the factorization is still an equality, not an approximation. It is computationally preferable when m ≫ n m \gg n m ≫ n , since it requires far less storage than a full SVD . For a matrix with more columns than rows, one can derive a thin form by taking the adjoint of the thin SVD of A ∗ \mathbf{A}^* A ∗ .
7.3.4 SVD and the 2-norm¶ The SVD is intimately connected to the 2-norm, as the following theorem describes.
The conclusion (7.3.13) can be proved by vector calculus. In the square case m = n m=n m = n , A \mathbf{A} A having full rank is identical to being invertible. The SVD is the usual means for computing the 2-norm and condition number of a matrix.
Example 7.3.4 (
SVD properties)
Example 7.3.4 We verify some of the fundamental SVD properties using standard Julia functions from LinearAlgebra
.
A = [i^j for i = 1:5, j = 0:3]
5×4 Matrix{Int64}:
1 1 1 1
1 2 4 8
1 3 9 27
1 4 16 64
1 5 25 125
To get only the singular values, use svdvals
.
4-element Vector{Float64}:
146.69715365883005
5.738569780953702
0.9998486640841027
0.11928082685241923
Here is verification of the connections between the singular values, norm, and condition number.
@show opnorm(A, 2);
@show σ[1];
opnorm(A, 2) = 146.69715365883005
σ[1] = 146.69715365883005
@show cond(A, 2);
@show σ[1] / σ[end];
cond(A, 2) = 1229.846887633767
σ[1] / σ[end] = 1229.846887633767
To get singular vectors as well, use svd
. The thin form of the factorization is the default.
U, σ, V = svd(A);
@show size(U);
@show size(V);
We verify the orthogonality of the singular vectors as follows:
@show opnorm(U' * U - I);
@show opnorm(V' * V - I);
opnorm(U' * U - I) = 4.947764483928556e-16
opnorm(V' * V - I) = 7.570241024277813e-16
Example 7.3.4 We verify some of the fundamental SVD properties using the built-in svd
function.
A = vander(1:5);
A = A(:, 1:4)
[U, S, V] = svd(A);
disp(sprintf("U is %d by %d. S is %d by %d. V is %d by %d.\n", size(U), size(S), size(V)))
U is 5 by 5. S is 5 by 4. V is 4 by 4.
We verify the orthogonality of the singular vectors as follows:
norm(U' * U - eye(5))
norm(V' * V - eye(4))
Here is verification of the connections between the singular values, norm, and condition number.
s = diag(S);
norm_A = norm(A)
sigma_max = s(1)
cond_A = cond(A)
sigma_ratio = s(1) / s(end)
Example 7.3.4 We verify some of the fundamental SVD properties using standard Julia functions from LinearAlgebra
.
A = array([[(i + 1.0) ** j for j in range(4)] for i in range(5)])
set_printoptions(precision=4)
print(A)
[[ 1. 1. 1. 1.]
[ 1. 2. 4. 8.]
[ 1. 3. 9. 27.]
[ 1. 4. 16. 64.]
[ 1. 5. 25. 125.]]
The factorization is obtained using svd
from numpy.linalg
.
from numpy.linalg import svd
U, sigma, Vh = svd(A)
print("singular values:")
print(sigma)
singular values:
[1.4670e+02 5.7386e+00 9.9985e-01 1.1928e-01]
By default, the full factorization type is returned. This can be a memory hog if one of the dimensions of A \mathbf{A} A is very large.
print("size of U:", U.shape)
print("size of V:", Vh.T.shape)
size of U: (5, 5)
size of V: (4, 4)
Both U \mathbf{U} U and V \mathbf{V} V are orthogonal (in the complex case, unitary). Note that it’s V ∗ \mathbf{V}^* V ∗ that is returned, not V \mathbf{V} V .
print(f"should be near zero: {norm(U.T @ U - eye(5), 2):.2e}")
print(f"should be near zero: {norm(Vh @ Vh.T - eye(4), 2):.2e}")
should be near zero: 5.61e-16
should be near zero: 1.07e-15
Next we test that we have the factorization promised by the SVD , using diagsvd
to construct a rectangular diagonal matrix.
from scipy.linalg import diagsvd
S = diagsvd(sigma, 5, 4)
print(f"should be near zero: {norm(A - U @ S @ Vh, 2):.2e}")
should be near zero: 8.56e-14
Here is verification of the connections between the singular values, norm, and condition number.
from numpy.linalg import cond
print("largest singular value:", sigma[0])
print("2-norm of the matrix: ", norm(A, 2))
print("singular value ratio:", sigma[0] / sigma[-1])
print("2-norm condition no.:", cond(A, 2))
largest singular value: 146.69715365883005
2-norm of the matrix: 146.69715365883005
singular value ratio: 1229.8468876337527
2-norm condition no.: 1229.8468876337529
For matrices that are much taller than they are wide, the thin SVD form is more memory-efficient, because U \mathbf{U} U takes the same shape.
A = random.randn(1000, 10)
U, sigma, Vh = svd(A, full_matrices=False)
print("size of U:", U.shape)
print("size of V:", Vh.shape)
size of U: (1000, 10)
size of V: (10, 10)
7.3.5 Exercises ¶ ✍ Each factorization below is algebraically correct. The notation I n \mathbf{I}_n I n means an n × n n\times n n × n identity. In each case, determine whether it is an SVD . If it is, write down σ 1 \sigma_1 σ 1 , u 1 \mathbf{u}_1 u 1 , and v 1 \mathbf{v}_1 v 1 . If it is not, state all of the ways in which it fails the required properties.
(a) [ 0 0 0 − 1 ] = [ 0 1 1 0 ] [ 1 0 0 0 ] [ 0 1 − 1 0 ] \begin{bmatrix}
0 & 0 \\ 0 & -1
\end{bmatrix} = \begin{bmatrix}
0 & 1 \\ 1 & 0
\end{bmatrix} \begin{bmatrix}
1 & 0 \\ 0 & 0
\end{bmatrix} \begin{bmatrix}
0 & 1 \\ -1 & 0
\end{bmatrix}\qquad [ 0 0 0 − 1 ] = [ 0 1 1 0 ] [ 1 0 0 0 ] [ 0 − 1 1 0 ]
(b) [ 0 0 0 − 1 ] = I 2 [ 0 0 0 − 1 ] I 2 \begin{bmatrix}
0 & 0 \\ 0 & -1
\end{bmatrix} =
\mathbf{I}_2 \begin{bmatrix}
0 & 0 \\ 0 & -1
\end{bmatrix}
\mathbf{I}_2 [ 0 0 0 − 1 ] = I 2 [ 0 0 0 − 1 ] I 2
(c)
[ 1 0 0 2 1 0 ] = [ α 0 − α 0 1 0 α 0 − α ] [ 2 0 0 2 0 0 ] [ 0 1 1 0 ] , α = 1 / 2 \begin{bmatrix}
1 & 0\\ 0 & \sqrt{2}\\ 1 & 0
\end{bmatrix} = \begin{bmatrix}
\alpha & 0 & -\alpha \\ 0 & 1 & 0 \\ \alpha & 0 & -\alpha
\end{bmatrix} \begin{bmatrix}
\sqrt{2} & 0 \\ 0 & \sqrt{2} \\ 0 & 0
\end{bmatrix} \begin{bmatrix}
0 & 1 \\ 1 & 0
\end{bmatrix}, \quad \alpha=1/\sqrt{2} ⎣ ⎡ 1 0 1 0 2 0 ⎦ ⎤ = ⎣ ⎡ α 0 α 0 1 0 − α 0 − α ⎦ ⎤ ⎣ ⎡ 2 0 0 0 2 0 ⎦ ⎤ [ 0 1 1 0 ] , α = 1/ 2
(d)
[ 2 2 − 1 1 0 0 ] = I 3 [ 2 0 0 2 0 0 ] [ α α − α α ] , α = 1 / 2 \begin{bmatrix}
\sqrt{2} & \sqrt{2}\\ -1 & 1\\ 0 & 0
\end{bmatrix} =
\mathbf{I}_3 \begin{bmatrix}
2 & 0 \\ 0 & \sqrt{2} \\ 0 & 0
\end{bmatrix} \begin{bmatrix}
\alpha & \alpha \\ -\alpha & \alpha
\end{bmatrix}, \quad \alpha=1/\sqrt{2} ⎣ ⎡ 2 − 1 0 2 1 0 ⎦ ⎤ = I 3 ⎣ ⎡ 2 0 0 0 2 0 ⎦ ⎤ [ α − α α α ] , α = 1/ 2
✍ Apply Theorem 7.3.2 to find an SVD of A = [ 1 0 0 0 0 1 − 1 − 1 ] . \mathbf{A}=\displaystyle \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ -1 & -1 \end{bmatrix}. A = ⎣ ⎡ 1 0 0 − 1 0 0 1 − 1 ⎦ ⎤ .
⌨ Let x
be a vector of 1000 equally spaced points between 0 and 1. Suppose A n \mathbf{A}_n A n is the 1000 × n 1000\times n 1000 × n matrix whose ( i , j ) (i,j) ( i , j ) entry is x i j − 1 x_i^{j-1} x i j − 1 for j = 1 , … , n j=1,\ldots,n j = 1 , … , n .
(a) Print out the singular values of A 1 \mathbf{A}_1 A 1 , A 2 \mathbf{A}_2 A 2 , and A 3 \mathbf{A}_3 A 3 .
(b) Make a log-linear plot of the singular values of A 40 \mathbf{A}_{40} A 40 .
(c) Repeat part (b) after converting the elements of x
to single precision.
(d) Having seen the plot for part (c), which singular values in part (b) do you suspect may be incorrect?
⌨ See Demo 7.1.5 for how to get the “mandrill” test image. Make a log-linear scatter plot of the singular values of the matrix of grayscale intensity values. (The shape of this graph is surprisingly similar across a wide range of images.)
✍ Prove that for a square real matrix A \mathbf{A} A , ∥ A ∥ 2 = ∥ A T ∥ 2 \| \mathbf{A} \|_2=\| \mathbf{A}^T \|_2 ∥ A ∥ 2 = ∥ A T ∥ 2 .
✍ Prove (7.3.14) of Theorem 7.3.3 , given that (7.3.13) is true. (Hint: If the SVD of A \mathbf{A} A is known, what is the SVD of A + \mathbf{A}^{+} A + ?)
✍ Suppose A ∈ R m × n \mathbf{A}\in\mathbb{R}^{m\times n} A ∈ R m × n , with m > n m>n m > n , has the thin SVD A = U ^ S ^ V T \mathbf{A}=\hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^T A = U ^ S ^ V T . Show that the matrix A A + \mathbf{A}\mathbf{A}^{+} A A + is equal to U ^ U ^ T \hat{\mathbf{U}}\hat{\mathbf{U}}^T U ^ U ^ T . (You must be careful with matrix sizes in this derivation.)
✍ In (3.2.6) we defined the 2-norm condition number of a rectangular matrix as κ ( A ) = ∥ A ∥ ⋅ ∥ A + ∥ \kappa(\mathbf{A})=\|\mathbf{A}\|\cdot \|\mathbf{A}^{+}\| κ ( A ) = ∥ A ∥ ⋅ ∥ A + ∥ , and then claimed (in the real case) that κ ( A ∗ A ) = κ ( A ) 2 \kappa(\mathbf{A}^*\mathbf{A})=\kappa(\mathbf{A})^2 κ ( A ∗ A ) = κ ( A ) 2 . Prove this assertion using the SVD .
✍ Show that the square of each singular value of A \mathbf{A} A is an eigenvalue of the matrix A A ∗ \mathbf{A}\mathbf{A}^* A A ∗ for any m × n m\times n m × n matrix A \mathbf{A} A . (You should consider the cases m > n m>n m > n and m ≤ n m\le n m ≤ n separately.)
✍ In this problem you will see how (7.3.13) is proved in the real case.
(a) Use the technique of Lagrange multipliers to show that among vectors that satisfy ∥ x ∥ 2 2 = 1 \|\mathbf{x}\|_2^2=1 ∥ x ∥ 2 2 = 1 , any vector that maximizes ∥ A x ∥ 2 2 \|\mathbf{A}\mathbf{x}\|_2^2 ∥ Ax ∥ 2 2 must be an eigenvector of A T A \mathbf{A}^T\mathbf{A} A T A . It will help to know that if B \mathbf{B} B is any symmetric matrix, the gradient of the scalar function x T B x \mathbf{x}^T\mathbf{B}\mathbf{x} x T Bx with respect to x \mathbf{x} x is 2 B x 2\mathbf{B}\mathbf{x} 2 Bx .
(b) Use the result of part (a) to prove (7.3.13) for real matrices.
✍ Suppose A ∈ R n × n \mathbf{A}\in\mathbb{R}^{n \times n} A ∈ R n × n , and define C \mathbf{C} C as in (7.3.7) .
(a) Suppose that v = [ x y ] \mathbf{v}=\begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} v = [ x y ] , and write the block equation C v = λ v \mathbf{C}\mathbf{v} = \lambda \mathbf{v} Cv = λ v as two individual equations involving both x \mathbf{x} x and y \mathbf{y} y .
(b) By applying some substitutions, rewrite the equations from part (a) as one in which x \mathbf{x} x was eliminated, and another in which y \mathbf{y} y was eliminated.
(c) Substitute the SVD A = U S V T \mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^T A = US V T and explain why λ 2 = σ k 2 \lambda^2=\sigma_k^2 λ 2 = σ k 2 for some singular value σ k \sigma_k σ k .
(d) As a more advanced variation, modify the argument to show that λ = 0 \lambda=0 λ = 0 is another possibility if A \mathbf{A} A is not square.