Skip to article frontmatterSkip to article content

Vector and matrix norms

The manipulations on matrices and vectors so far in this chapter have been algebraic, much like those in an introductory linear algebra course. In order to progress to the analysis of the algorithms we have introduced, we need a way to measure the size of vectors and matrices—size in the sense of magnitude or distance, not the number of rows and columns.

2.7.1Vector norms

For vectors, we use a norm \| \cdot \|, which is a function from Rn\real^n to R\real with the following properties for all nn-vectors x,y\mathbf{x},\mathbf{y} and scalars α:[1]

x0,x=0    x=0,αx=αx,x+yx+y.\begin{align*} \norm{\mathbf{x}} &\ge 0, \\ \norm{\mathbf{x}} &=0 \;\Leftrightarrow \; \mathbf{x}=\boldsymbol{0}, \\ \norm{\alpha \mathbf{x}} &= \abs{\alpha}\, \norm{\mathbf{x}}, \\ \norm{\mathbf{x}+\mathbf{y}} & \le \norm{\mathbf{x}} + \norm{\mathbf{y}}. \end{align*}

The last of these properties is known as the triangle inequality. It is natural to interpret x=x0\| \mathbf{x} \|=\| \mathbf{x}-\boldsymbol{0} \| as the distance from x\mathbf{x} to the origin and xy\| \mathbf{x}-\mathbf{y} \| as the distance from x\mathbf{x} to y\mathbf{y}. We will be using only the three most important vector norms, defined as follows.

The 2-norm corresponds to ordinary Euclidean distance.

In any norm, we refer to a vector x\mathbf{x} satisfying x=1\| \mathbf{x} \|=1 as a unit vector. For any nonzero vector v\mathbf{v} we can find a unit vector through the normalization x=v/v\mathbf{x}=\mathbf{v}/\|\mathbf{v}\|. Thus, we can interpret

v=vvv \mathbf{v} = \| \mathbf{v} \| \,\cdot\, \frac{\mathbf{v}}{\| \mathbf{v} \|}

as writing a nonzero vector v\mathbf{v} in magnitude–direction form.

We say that a sequence of vectors x1,x2,\mathbf{x}_1,\mathbf{x}_2,\ldots converges to x\mathbf{x} if

limkxkx=0. \lim_{k\rightarrow\infty} \norm{\mathbf{x}_k - \mathbf{x}} = 0.

By definition, a sequence is convergent in the infinity norm if and only if it converges componentwise. The same is true for a convergent sequence in any norm.

2.7.2Matrix norms

Although we view matrices as two-dimensional, we can also interpret them as vectors: simply stack the columns on top of one another.[2] Hence we can define matrix norms via vector norms. This is sometimes done with the vector 2-norm and leads to the matrix Frobenius norm:

AF=(i,jAij2)1/2.\| \mathbf{A} \|_F = \left( \sum_{i,j} |A_{ij}|^2 \right)^{1/2}.

However, it often proves to be more useful to define matrix norms differently.

The last equality above follows from linearity (as shown in Exercise 5). It is derived from the interpretation of a matrix as a linear operator between Rn\real^n and Rm\real^m. Thus in the 2-norm, for instance,

A2=maxx2=1Ax2.\| \mathbf{A} \|_2 = \max_{\| \mathbf{x} \|_2=1} \| \mathbf{A}\mathbf{x} \|_2.

For the rest of this section we will continue to omit subscripts when we want to refer to an unspecified induced norm; after this section, an unsubscripted norm is understood to be the 2-norm.

The definition of an induced matrix norm may seem oddly complicated. However, there are some key properties that follow directly from the definition.

One can interpret the definition of an induced norm geometrically. Each vector x\mathbf{x} on the unit “sphere” (as defined by the chosen vector norm) is mapped to its image Ax\mathbf{A}\mathbf{x}, and the norm of A\mathbf{A} is the radius of the smallest “sphere” that encloses all such images.

In addition, two of the vector norms we have encountered lead to equivalent formulas that are easy to compute from the matrix elements.

A mnemonic for these is that the symbol extends horizontally while the 1 character extends vertically, each indicating the direction of the summation in its formula. Also, both formulas give the same result for m×1m\times 1 matrices as the vector norm. In both cases you must take absolute values of the matrix elements first.

The geometric interpretation of the matrix 2-norm shown in Demo 2.7.2, as the radius of the smallest circle (or sphere or hypersphere in higher dimensions) containing the images of all unit vectors, is not a practical means of computing the norm. Nor is there a simple formula like (2.7.14) or (2.7.15) for it. The computation of the matrix 2-norm is discussed further in Chapter 7.

2.7.3Exercises

  1. ✍ Why is the vector 1-norm also called the taxicab norm?

  2. (a) Draw the unit “circle” in the -norm, i.e., the set of all vectors xR2\mathbf{x}\in\real^2 such that x=1\| \mathbf{x} \|_\infty=1.

    (b) Draw the unit “circle” in the 1-norm.

  3. ✍ Prove that for all vectors xRn\mathbf{x}\in\real^n,

    (a) xx2;\| \mathbf{x} \|_\infty \le \| \mathbf{x} \|_2; \qquad (b) x2x1\| \mathbf{x} \|_2 \le \| \mathbf{x} \|_1.

  4. ✍ Prove that for any vectors x\mathbf{x}, y\mathbf{y} in Rn\real^n, xTyx1y|\mathbf{x}^T\mathbf{y}| \le \| \mathbf{x} \|_1\| \mathbf{y} \|_\infty.

  5. ✍ Prove using Definition 2.7.2 that for any induced matrix norm, matrix A\mathbf{A}, and scalar cc, cA=cA\| c\mathbf{A} \| = |c|\cdot \| \mathbf{A} \|.

  6. ✍ Let A=[1122].\mathbf{A} = \displaystyle \begin{bmatrix} -1 & 1 \\ 2 & 2 \end{bmatrix}.

    (a) Find all vectors satisfying x=1\|\mathbf{x}\|_\infty=1 and Ax=A\| \mathbf{A}\mathbf{x} \|_\infty=\| \mathbf{A} \|_\infty.

    (b) Find a vector satisfying x1=1\|\mathbf{x}\|_1=1 and Ax1=A1\| \mathbf{A}\mathbf{x} \|_1=\| \mathbf{A} \|_1.

    (c) Find a vector satisfying x2=1\|\mathbf{x}\|_2=1 such that Ax2=A2\| \mathbf{A}\mathbf{x} \|_2=\| \mathbf{A} \|_2. (Hint: A unit two-dimensional vector is a function only of its angle with the x1x_1-axis. Use the definition of A2\|\mathbf{A}\|_2 as the maximum of Ax2\|\mathbf{A}\mathbf{x}\|_2, which is a also a function of the angle.)

  7. ✍ Prove the equivalence of the two formulas for a matrix norm in (2.7.6).

  8. ✍ Prove that for any induced matrix norm and nonsingular matrix A\mathbf{A}, A1(A)1\| \mathbf{A}^{-1} \| \ge (\| \mathbf{A} \|)^{-1}. (Hint: Apply Theorem 2.7.2.)

  9. (a) Prove that for any vRn\mathbf{v}\in \real^n,

    vpmaxi=1,,nvi,\| \mathbf{v} \|_p \ge \max_{i=1,\ldots,n} |v_i|,

    where p=1p=1, 2, or .

    (b) Prove that for any ARn×n\mathbf{A}\in\real^{n \times n},

    Apmaxi,j=1,,nAij,\| \mathbf{A} \|_p \ge \max_{i,j=1,\ldots,n} |A_{ij}|,

    where p=1p=1, 2, or . (Hint: For p=2p=2, rearrange (2.7.8) for a well-chosen particular value of x\mathbf{x}.)

  10. ✍ Prove using Definition 2.7.2 that if D\mathbf{D} is a diagonal matrix, then D2=maxiDii\|\mathbf{D}\|_2 = \max_{i} |D_{ii}|. You may assume the matrix is real and square, but that does not affect the result or the proof in any significant way. (Hint: Let M=maxiDiiM=\max_{i} |D_{ii}|. Proceed in two stages, showing that D2M\|\mathbf{D}\|_2\ge M and separately that D2M\|\mathbf{D}\|_2\le M.)

  11. ✍ Suppose that A\mathbf{A} is n×n{n\times n} and that A<1\| \mathbf{A} \|<1 in some induced matrix norm.

    (a) Show that (IA)(\mathbf{I}-\mathbf{A}) is nonsingular. (Hint: Use the definition of an induced matrix norm to show that if (IA)x=0(\mathbf{I}-\mathbf{A})\mathbf{x}=\boldsymbol{0} for all nonzero x\mathbf{x}, then A1\| \mathbf{A} \|\ge 1.)

    (b) Show that limmAm=0\lim_{m\rightarrow \infty} \mathbf{A}^m = \boldsymbol{0}. (For matrices as with vectors, we say BmL\mathbf{B}_m \rightarrow \mathbf{L} if BmL0\| \mathbf{B}_m-\mathbf{L} \| \rightarrow 0.)

    (c) Use (a) and (b) to show that we may obtain the geometric series

    (IA)1=k=0Ak.(\mathbf{I}-\mathbf{A})^{-1} = \sum_{k=0}^\infty \mathbf{A}^k.

    (Hint: Start with (k=0mAk)(IA)\left(\sum_{k=0}^m \mathbf{A}^k\right)(\mathbf{I}-\mathbf{A}) and take the limit.)

Footnotes
  1. The same statements work for vectors with complex entries, with complex modulus in place of absolute values.

  2. Column stacking is actually how matrices are stored in memory within Julia and is known as column-major order. MATLAB and FORTRAN also use column-major order, while C and Python use row-major order, in which the rows are stacked.