Skip to article frontmatterSkip to article content

The QR factorization

Sets of vectors satisfying a certain property are useful both theoretically and computationally.

In two and three dimensions, orthogonality is the same as perpendicularity.

Orthogonal vectors simplify inner products. For example, if q1\mathbf{q}_1 and q2\mathbf{q}_2 are orthogonal, then

q1q222=(q1q2)T(q1q2)=q1Tq12q1Tq2+q2Tq2=q122+q222.\| \mathbf{q}_1 - \mathbf{q}_2 \|_2^2 = (\mathbf{q}_1-\mathbf{q}_2)^T(\mathbf{q}_1-\mathbf{q}_2) = \mathbf{q}_1^T\mathbf{q}_1 - 2 \mathbf{q}_1^T\mathbf{q}_2 + \mathbf{q}_2^T\mathbf{q}_2 = \|\mathbf{q}_1\|_2^2 + \|\mathbf{q}_2\|_2^2.

As in the rest of this chapter, we will be using the 2-norm exclusively.

Equation (3.3.2) is the key to the computational attractiveness of orthogonality. Figure 3.3.1 shows how nonorthogonal vectors can allow a multidimensional version of subtractive cancellation, in which xy\|\mathbf{x}-\mathbf{y}\| is much smaller than x\|\mathbf{x}\| and y\|\mathbf{y}\|. As the figure illustrates, orthogonal vectors do not allow this phenomenon. By (3.3.2), the magnitude of a vector difference or sum is larger than the magnitudes of the original vectors.

Nonorthogonal vectors can cause cancellation when subtracted, but orthogonal vectors never do.

Figure 3.3.1:Nonorthogonal vectors can cause cancellation when subtracted, but orthogonal vectors never do.

3.3.1Orthogonal and ONC matrices

Statements about orthogonal vectors are often made more easily in matrix form. Let Q\mathbf{Q} be an n×kn\times k matrix whose columns q1,,qk\mathbf{q}_1, \ldots, \mathbf{q}_k are orthogonal vectors. The orthogonality conditions (3.3.1) become simply that QTQ\mathbf{Q}^T\mathbf{Q} is a diagonal matrix, since

QTQ=[q1Tq2TqkT][q1q2qk]=[q1Tq1q1Tq2q1Tqkq2Tq1q2Tq2q2TqkqkTq1qkTq2qkTqk].\mathbf{Q}^T \mathbf{Q} = \begin{bmatrix} \mathbf{q}_1^T \\[1mm] \mathbf{q}_2^T \\ \vdots \\ \mathbf{q}_k^T \end{bmatrix} \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_k \end{bmatrix} = \begin{bmatrix} \mathbf{q}_1^T\mathbf{q}_1 & \mathbf{q}_1^T\mathbf{q}_2 & \cdots & \mathbf{q}_1^T\mathbf{q}_k \\[1mm] \mathbf{q}_2^T\mathbf{q}_1 & \mathbf{q}_2^T\mathbf{q}_2 & \cdots & \mathbf{q}_2^T\mathbf{q}_k \\ \vdots & \vdots & & \vdots \\ \mathbf{q}_k^T\mathbf{q}_1 & \mathbf{q}_k^T\mathbf{q}_2 & \cdots & \mathbf{q}_k^T\mathbf{q}_k \end{bmatrix}.

If the columns of Q\mathbf{Q} are orthonormal, then QTQ\mathbf{Q}^T\mathbf{Q} is the k×kk\times k identity matrix. This is such an important property that we will break with common practice here and give this type of matrix a name.

Of particular interest is a square ONC matrix.[1]

Orthogonal matrices have properties beyond Theorem 3.3.1.

3.3.2Orthogonal factorization

Now we come to another important way to factor a matrix: the QR factorization. As we will show below, the QR factorization plays a role in linear least squares analogous to the role of LU factorization in linear systems.

In most introductory books on linear algebra, the QR factorization is derived through a process known as Gram–Schmidt orthogonalization. However, while it is an important tool for theoretical work, the Gram–Schmidt process is numerically unstable. We will consider an alternative construction in Computing QR factorizations.

When mm is much larger than nn, which is often the case, there is a compressed form of the factorization that is more efficient. In the product

A=[q1q2qm][r11r12r1n0r22r2n00rnn000000],\mathbf{A} = \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_m \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots\\ 0 & 0 & \cdots & r_{nn} \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix},

the vectors qn+1,,qm\mathbf{q}_{n+1},\ldots,\mathbf{q}_m always get multiplied by zero. Nothing about A\mathbf{A} is lost if we delete them and reduce the factorization to the equivalent form

A=[q1q2qn][r11r12r1n0r22r2n00rnn]=Q^R^.\mathbf{A} = \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_n \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots\\ 0 & 0 & \cdots & r_{nn} \end{bmatrix} = \hat{\mathbf{Q}} \hat{\mathbf{R}}.

3.3.3Least squares and QR

If we substitute the thin factorization (3.3.6) into the normal equations (3.2.3), we can simplify expressions a great deal.

ATAx=ATb,R^TQ^TQ^R^x=R^TQ^Tb,R^TR^x=R^TQ^Tb.\begin{split} \mathbf{A}^T\mathbf{A} \mathbf{x} &= \mathbf{A}^T \mathbf{b}, \\ \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \hat{\mathbf{Q}} \hat{\mathbf{R}} \mathbf{x} &= \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \mathbf{b}, \\ \hat{\mathbf{R}}^T \hat{\mathbf{R}} \mathbf{x}& = \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \mathbf{b}. \end{split}

In order to have the normal equations be well posed, we require that A\mathbf{A} is not rank-deficient (as proved in Theorem 3.2.2). This is enough to guarantee that R^\hat{\mathbf{R}} is nonsingular (see Exercise 4). Therefore, its transpose is nonsingular as well, and we arrive at

R^x=Q^Tb.\hat{\mathbf{R}} \mathbf{x}=\hat{\mathbf{Q}}^T \mathbf{b}.

This algorithm is implemented in Function 3.3.2. It is essentially the algorithm used internally by Julia when A\b is called. The execution time is dominated by the factorization, the most common method for which is described in Computing QR factorizations.

The solution of least-squares problems via QR factorization is more stable than when the normal equations are formulated and solved directly.

3.3.4Exercises

  1. ✍ Prove part 3 of Theorem 3.3.1.

  2. ✍ Prove Theorem 3.3.2. For the third part, use the definition of the 2-norm as an induced matrix norm, then apply some of our other results as needed.

  3. ⌨ Let t0,,tmt_0,\ldots,t_m be m+1m+1 equally spaced points in [1,1][-1,1]. Let A\mathbf{A} be the matrix in (3.1.2) for m=400m=400 and fitting by polynomials of degree less than 5. Find the thin QR factorization of A\mathbf{A}, and, on a single graph, plot every column of Q^\hat{\mathbf{Q}} as a function of the vector tt.

  4. ✍ Prove that if the m×nm\times n (mnm\ge n) matrix A\mathbf{A} is not rank-deficient, then the factor R^\hat{\mathbf{R}} of the thin QR factorization is invertible. (Hint: Suppose on the contrary that R^\hat{\mathbf{R}} is singular. Show using the factored form of A\mathbf{A} that this would imply that A\mathbf{A} is rank-deficient.)

  5. ✍ Let A\mathbf{A} be m×nm\times n with m>nm>n. Show that if A=QR\mathbf{A}=\mathbf{Q}\mathbf{R} is a QR factorization and R\mathbf{R} has rank nn, then A+=R+QT\mathbf{A}^+=\mathbf{R}^+\mathbf{Q}^T.

  6. ✍ Let A\mathbf{A} be m×nm\times n with m>nm>n. Show that if A=Q^R^\mathbf{A}=\hat{\mathbf{Q}}\hat{\mathbf{R}} is a thin QR factorization and R^\hat{\mathbf{R}} is invertible, then A+=R^1Q^T\mathbf{A}^+=\hat{\mathbf{R}}^{-1}\hat{\mathbf{Q}}^T.

  7. ⌨ Repeat Exercise 3.1.2, but use thin QR factorization rather than the backslash to solve the least-squares problem.

  8. ✍ The matrix P=Q^Q^T\mathbf{P}=\hat{\mathbf{Q}} \hat{\mathbf{Q}}^T derived from the thin QR factorization has some interesting and important properties.

    (a) Prove that P=AA+\mathbf{P}=\mathbf{A}\mathbf{A}^+.

    (b) Prove that P2=P\mathbf{P}^2=\mathbf{P}. (This property defines a projection matrix.)

    (c) Any vector x\mathbf{x} may be written trivially as x=u+v\mathbf{x}=\mathbf{u}+\mathbf{v}, where u=Px\mathbf{u}=\mathbf{P}\mathbf{x} and v=(IP)x\mathbf{v}=(\mathbf{I}-\mathbf{P})\mathbf{x}. Prove that u\mathbf{u} and v\mathbf{v} are orthogonal. (Together with part (b), this means that P\mathbf{P} is an orthogonal projector.)

Footnotes
  1. Confusingly, a square matrix whose columns are orthogonal is not necessarily an orthogonal matrix; the columns must be orthonormal, which is a stricter condition.