Skip to article frontmatterSkip to article content

Dimension reduction

The SVD has another important property that proves very useful in a variety of applications. Let A\mathbf{A} be a real m×nm\times n matrix with SVD A=USVT\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^T and (momentarily) mnm\ge n. Another way of writing the thin form of the SVD is

A=U^S^VT=[u1u2un][σ1σn][v1TvnT] =[σ1u1σnun][v1TvnT]=σ1u1v1T++σrurvrT=i=1rσiuiviT,\begin{split} \mathbf{A} = \hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^T &= \begin{bmatrix} \rule[-0.3em]{0pt}{1em} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_n \end{bmatrix} \: \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \end{bmatrix} \: \begin{bmatrix} \mathbf{v}_1^T \\ \vdots \\ \mathbf{v}_n^T \end{bmatrix}\ \\ &= \begin{bmatrix} \rule[-0.3em]{0pt}{1em} \sigma_1\mathbf{u}_1 & \cdots & \sigma_n\mathbf{u}_n \end{bmatrix}\: \begin{bmatrix} \mathbf{v}_1^T \\ \vdots \\ \mathbf{v}_n^T \end{bmatrix} \\ &= \sigma_1 \mathbf{u}_{1}\mathbf{v}_{1}^T + \cdots + \sigma_r \mathbf{u}_{r}\mathbf{v}_{r}^T = \sum_{i=1}^r \sigma_i \mathbf{u}_{i}\mathbf{v}_{i}^T, \end{split}

where rr is the rank of A\mathbf{A}. The final formula also holds for the case m<nm<n.

Each outer product uiviT\mathbf{u}_{i}\mathbf{v}_{i}^T is a rank-1 matrix of unit 2-norm. Thanks to the ordering of singular values, then, Equation (7.5.1) expresses A\mathbf{A} as a sum of decreasingly important contributions. This motivates the definition, for 1kr1\le k \le r,

Ak=i=1kσiuiviT=UkSkVkT,\mathbf{A}_k = \sum_{i=1}^k \sigma_i \mathbf{u}_{i}\mathbf{v}_{i}^T = \mathbf{U}_k \mathbf{S}_k \mathbf{V}_k^T,

where Uk\mathbf{U}_k and Vk\mathbf{V}_k are the first kk columns of U\mathbf{U} and V\mathbf{V}, respectively, and Sk\mathbf{S}_k is the upper-left k×kk\times k submatrix of S\mathbf{S}.

The rank of a sum of matrices is always less than or equal to the sum of the ranks, so Ak\mathbf{A}_k is a rank-kk approximation to A\mathbf{A}. It turns out that Ak\mathbf{A}_k is the best rank-kk approximation of A\mathbf{A}, as measured in the matrix 2-norm.

7.5.1Compression

If the singular values of A\mathbf{A} decrease sufficiently rapidly, then Ak\mathbf{A}_{k} may capture the most significant behavior of the matrix for a reasonably small value of kk.

The use of dimension reduction offered by low-rank SVD approximation goes well beyond simply reducing computation time. By isolating the most important contributions to the matrix, dimension reduction can uncover deep connections and trends that are otherwise obscured by weaker effects and noise.

One useful way to quantify the decay in the singular values is to compute

sk=i=1kσi2,τk=sksr,k=1,,r.s_k = \sum_{i=1}^k \sigma_i^2, \quad \tau_k = \frac{s_k}{s_r}, \quad k=1,\ldots,r.

Clearly 0τk10\le \tau_k \le 1 and τk\tau_k is non-decreasing as a function of kk. We can think of τk\tau_k as the fraction of energy (or in statistical terms, variance) contained in the singular values up to and including the kkth.[1]

Not all data sets can be reduced effectively to a small number of dimensions, but as Demo 7.5.2 illustrates, in some cases reduction reveals information that corresponds to real-world understanding.

7.5.3Exercises

  1. ✍ Suppose that A\mathbf{A} is an n×nn\times n matrix. Explain why σn\sigma_n is the distance (in 2-norm) from A\mathbf{A} to the set of all singular matrices.

  2. ✍ Suppose A\mathbf{A} is a 7×47\times 4 matrix and the eigenvalues of AA\mathbf{A}^*\mathbf{A} are 3, 4, 7, and 10. How close is A\mathbf{A} in the 2-norm to (a) a rank-3 matrix? (b) a rank-2 matrix?

  3. (a) ⌨ Find the rank-1 matrix closest to

    A=[1551],\mathbf{A}=\displaystyle \begin{bmatrix} 1 & 5 \\ 5 & 1 \end{bmatrix},

    as measured in the 2-norm.

    (b) ⌨ Repeat part (a) for

    A=[1501].\mathbf{A}=\displaystyle \begin{bmatrix} 1 & 5 \\ 0 & 1 \end{bmatrix}.
  4. ✍ Find the rank-1 matrix closest to

    A=[1bb1],\mathbf{A}=\displaystyle \begin{bmatrix} 1 & b \\ b & 1 \end{bmatrix},

    as measured in the 2-norm, where b>0b>0.

  5. ⌨ Following Demo 7.5.1 as a guide, load the “mandrill” test image and convert it to a matrix of floating-point pixel grayscale intensities. Using the SVD, display as images the best approximations of rank 5, 10, 15, and 20.

Footnotes
  1. In statistics this quantity may be interpreted as the fraction of explained variance.