Skip to article frontmatterSkip to article content

Singular value decomposition

We now introduce another factorization that is as fundamental as the EVD.

7.3.1Connections to the EVD

Except for some unimportant technicalities, the eigenvectors of AA\mathbf{A}^*\mathbf{A}, when appropriately ordered and normalized, are right singular vectors of A\mathbf{A}. The left singular vectors could then be deduced from the identity AV=US\mathbf{A}\mathbf{V} = \mathbf{U}\mathbf{S}.

Another close connection between EVD and SVD comes via the (m+n)×(m+n)(m+n)\times (m+n) matrix

C=[0AA0].\mathbf{C} = \begin{bmatrix} 0 & \mathbf{A}^* \\ \mathbf{A} & 0 \end{bmatrix}.

If σ is a singular value of B\mathbf{B}, then σ and σ-\sigma are eigenvalues of C\mathbf{C}, and the associated eigenvector immediately reveals a left and a right singular vector (see Exercise 11). This connection is implicitly exploited by software to compute the SVD.

7.3.2Interpreting the SVD

Another way to write A=USV\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^* is

AV=US.\mathbf{A}\mathbf{V}=\mathbf{U}\mathbf{S}.

Taken columnwise, this equation means

Avk=σkuk,k=1,,r=min{m,n}.\mathbf{A} \mathbf{v}_{k} = \sigma_k \mathbf{u}_{k}, \qquad k=1,\ldots,r=\min\{m,n\}.

In words, each right singular vector is mapped by A\mathbf{A} to a scaled version of its corresponding left singular vector; the magnitude of scaling is its singular value.

Both the SVD and the EVD describe a matrix in terms of some special vectors and a small number of scalars. Table 7.3.1 summarizes the key differences. The SVD sacrifices having the same basis in both source and image spaces—after all, they may not even have the same dimension—but as a result gains orthogonality in both spaces.

Table 7.3.1:Comparison of the EVD and SVD

EVDSVD
exists for most square matricesexists for all rectangular and square matrices
Axk=λkxk\mathbf{A}\mathbf{x}_k = \lambda_k \mathbf{x}_kAvk=σkuk\mathbf{A} \mathbf{v}_k = \sigma_k \mathbf{u}_k
same basis for domain and range of A\mathbf{A}two orthonormal bases
may have poor conditioningperfectly conditioned

7.3.3Thin form

In The QR factorization we saw that a matrix has both full and thin forms of the QR factorization. A similar situation holds with the SVD.

Suppose A\mathbf{A} is m×nm\times n with m>nm > n and let A=USV\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^* be an SVD. The last mnm-n rows of S\mathbf{S} are all zero due to the fact that S\mathbf{S} is diagonal. Hence

US=[u1unun+1um][σ1σn0]=[u1un][σ1σn]=U^S^,\begin{align*} \mathbf{U} \mathbf{S} & = \begin{bmatrix} \mathbf{u}_1 & \cdots & \mathbf{u}_n & \mathbf{u}_{n+1} & \cdots & \mathbf{u}_m \end{bmatrix} \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \\ & & \\ & \boldsymbol{0} & \\ & & \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{u}_1 & \cdots & \mathbf{u}_n \end{bmatrix} \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \end{bmatrix} = \hat{\mathbf{U}} \hat{\mathbf{S}}, \end{align*}

in which U^\hat{\mathbf{U}} is m×nm\times n and S^\hat{\mathbf{S}} is n×nn\times n. This allows us to define the thin SVD

A=U^S^V,\mathbf{A}=\hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^*,

in which S^\hat{\mathbf{S}} is square and diagonal and U^\hat{\mathbf{U}} is ONC but not square.

The thin form retains all the information about A\mathbf{A} from the SVD; the factorization is still an equality, not an approximation. It is computationally preferable when mnm \gg n, since it requires far less storage than a full SVD. For a matrix with more columns than rows, one can derive a thin form by taking the adjoint of the thin SVD of A\mathbf{A}^*.

7.3.4SVD and the 2-norm

The SVD is intimately connected to the 2-norm, as the following theorem describes.

The conclusion (7.3.13) can be proved by vector calculus. In the square case m=nm=n, A\mathbf{A} having full rank is identical to being invertible. The SVD is the usual means for computing the 2-norm and condition number of a matrix.

7.3.5Exercises

  1. ✍ Each factorization below is algebraically correct. The notation In\mathbf{I}_n means an n×nn\times n identity. In each case, determine whether it is an SVD. If it is, write down σ1\sigma_1, u1\mathbf{u}_1, and v1\mathbf{v}_1. If it is not, state all of the ways in which it fails the required properties.

    (a) [0001]=[0110][1000][0110]\begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}\qquad (b) [0001]=I2[0001]I2\begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} = \mathbf{I}_2 \begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} \mathbf{I}_2

    (c) [100210]=[α0α010α0α][200200][0110],α=1/2\begin{bmatrix} 1 & 0\\ 0 & \sqrt{2}\\ 1 & 0 \end{bmatrix} = \begin{bmatrix} \alpha & 0 & -\alpha \\ 0 & 1 & 0 \\ \alpha & 0 & -\alpha \end{bmatrix} \begin{bmatrix} \sqrt{2} & 0 \\ 0 & \sqrt{2} \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}, \quad \alpha=1/\sqrt{2}

    (d) [221100]=I3[200200][αααα],α=1/2\begin{bmatrix} \sqrt{2} & \sqrt{2}\\ -1 & 1\\ 0 & 0 \end{bmatrix} = \mathbf{I}_3 \begin{bmatrix} 2 & 0 \\ 0 & \sqrt{2} \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \alpha & \alpha \\ -\alpha & \alpha \end{bmatrix}, \quad \alpha=1/\sqrt{2}

  2. ✍ Apply Theorem 7.3.2 to find an SVD of A=[10000111].\mathbf{A}=\displaystyle \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ -1 & -1 \end{bmatrix}.

  3. ⌨ Let x be a vector of 1000 equally spaced points between 0 and 1. Suppose An\mathbf{A}_n is the 1000×n1000\times n matrix whose (i,j)(i,j) entry is xij1x_i^{j-1} for j=1,,nj=1,\ldots,n.

    (a) Print out the singular values of A1\mathbf{A}_1, A2\mathbf{A}_2, and A3\mathbf{A}_3.

    (b) Make a log-linear plot of the singular values of A40\mathbf{A}_{40}.

    (c) Repeat part (b) after converting the elements of x to single precision.

    (d) Having seen the plot for part (c), which singular values in part (b) do you suspect may be incorrect?

  4. ⌨ See Demo 7.1.5 for how to get the “mandrill” test image. Make a log-linear scatter plot of the singular values of the matrix of grayscale intensity values. (The shape of this graph is surprisingly similar across a wide range of images.)

  5. ✍ Prove that for a square real matrix A\mathbf{A}, A2=AT2\| \mathbf{A} \|_2=\| \mathbf{A}^T \|_2.

  6. ✍ Prove (7.3.14) of Theorem 7.3.3, given that (7.3.13) is true. (Hint: If the SVD of A\mathbf{A} is known, what is the SVD of A+\mathbf{A}^{+}?)

  7. ✍ Suppose ARm×n\mathbf{A}\in\mathbb{R}^{m\times n}, with m>nm>n, has the thin SVD A=U^S^VT\mathbf{A}=\hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^T. Show that the matrix AA+\mathbf{A}\mathbf{A}^{+} is equal to U^U^T\hat{\mathbf{U}}\hat{\mathbf{U}}^T. (You must be careful with matrix sizes in this derivation.)

  1. ✍ In (3.2.6) we defined the 2-norm condition number of a rectangular matrix as κ(A)=AA+\kappa(\mathbf{A})=\|\mathbf{A}\|\cdot \|\mathbf{A}^{+}\|, and then claimed (in the real case) that κ(AA)=κ(A)2\kappa(\mathbf{A}^*\mathbf{A})=\kappa(\mathbf{A})^2. Prove this assertion using the SVD.

  2. ✍ Show that the square of each singular value of A\mathbf{A} is an eigenvalue of the matrix AA\mathbf{A}\mathbf{A}^* for any m×nm\times n matrix A\mathbf{A}. (You should consider the cases m>nm>n and mnm\le n separately.)

  1. ✍ In this problem you will see how (7.3.13) is proved in the real case.

    (a) Use the technique of Lagrange multipliers to show that among vectors that satisfy x22=1\|\mathbf{x}\|_2^2=1, any vector that maximizes Ax22\|\mathbf{A}\mathbf{x}\|_2^2 must be an eigenvector of ATA\mathbf{A}^T\mathbf{A}. It will help to know that if B\mathbf{B} is any symmetric matrix, the gradient of the scalar function xTBx\mathbf{x}^T\mathbf{B}\mathbf{x} with respect to x\mathbf{x} is 2Bx2\mathbf{B}\mathbf{x}.

    (b) Use the result of part (a) to prove (7.3.13) for real matrices.

  1. ✍ Suppose ARn×n\mathbf{A}\in\mathbb{R}^{n \times n}, and define C\mathbf{C} as in (7.3.7).

    (a) Suppose that v=[xy]\mathbf{v}=\begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix}, and write the block equation Cv=λv\mathbf{C}\mathbf{v} = \lambda \mathbf{v} as two individual equations involving both x\mathbf{x} and y\mathbf{y}.

    (b) By applying some substitutions, rewrite the equations from part (a) as one in which x\mathbf{x} was eliminated, and another in which y\mathbf{y} was eliminated.

    (c) Substitute the SVD A=USVT\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^T and explain why λ2=σk2\lambda^2=\sigma_k^2 for some singular value σk\sigma_k.

    (d) As a more advanced variation, modify the argument to show that λ=0\lambda=0 is another possibility if A\mathbf{A} is not square.