Skip to article frontmatterSkip to article content

Review of linear algebra

1Terminology

An ordinary number in R\mathbb{R} or C\mathbb{C} may be called a scalar. An m×nm\times n matrix A\mathbf{A} is a rectangular mm-by-nn array of numbers called elements or entries. The numbers mm and nn are called the row dimension and the column dimension, respectively; collectively they describe the size or shape of A\mathbf{A}. We say A\mathbf{A} belongs to the set Rm×n\mathbb{R}^{m\times n} if its entries are real, or Cm×n\mathbb{C}^{m\times n} if they are complex-valued. A square matrix has equal row and column dimensions. A row vector has dimension 1×n1\times n, while a column vector has dimension m×1m \times 1.

In this text, all vectors are column vectors, and we use Rn\mathbb{R}^n or Cn\mathbb{C}^n to denote spaces of these vectors. When a row vector is needed, it is given an explicit transpose symbol (see below).

We use capital letters in bold to refer to matrices, and lowercase bold letters for vectors. The bold symbol 0\boldsymbol{0} may refer to a vector of all zeros or to a zero matrix, depending on context; we use 0 as the scalar zero only.

To refer to a specific element of a matrix, we use the uppercase name of the matrix without boldface. For instance, A24A_{24} refers to the (2,4)(2,4) element of A\mathbf{A}. To refer to an element of a vector, we use just one subscript, as in x3x_3. A boldface character with one or more subscripts, on the other hand, is a matrix (uppercase) or vector (lowercase) that belongs to a numbered collection.

We will have frequent need to refer to the individual columns of a matrix as vectors. We use a lowercase bold version of the matrix name with a subscript to represent the column number. For example, a1,a2,,an\mathbf{a}_1,\mathbf{a}_2,\ldots,\mathbf{a}_n are the columns of the m×nm\times n matrix A\mathbf{A}. Conversely, whenever we define a sequence of vectors v1,,vp\mathbf{v}_1,\ldots,\mathbf{v}_p, we can implicitly consider them to be columns of a matrix V\mathbf{V}. Sometimes we might write

V=[vj]\mathbf{V}=\bigl[ \mathbf{v}_j \bigr]

to emphasize the connection between a matrix and its columns.

The diagonal (more specifically, main diagonal) of an n×nn\times n matrix A\mathbf{A} refers to the entries AiiA_{ii}, i=1,,ni=1,\ldots,n. The entries AijA_{ij} where ji=kj-i=k are on a superdiagonal if k>0k>0 and a subdiagonal if k<0k<0. The diagonals are numbered as indicated here:

[012n1101n2n+2101n+1210]. \begin{bmatrix} 0 & 1 & 2 & \cdots & n-1 \\ -1 & 0 & 1 & \cdots & n-2 \\ \vdots & \ddots & \ddots & \ddots & \vdots \\ -n+2 & \cdots & -1 & 0 & 1\\ -n+1 & \cdots & -2 & -1 & 0 \end{bmatrix}.

A diagonal matrix is one whose entries are all zero off the main diagonal. An upper triangular matrix U\mathbf{U} has entries UijU_{ij} with Uij=0U_{ij}=0 if i>ji>j, and a lower triangular matrix L\mathbf{L} has Lij=0L_{ij}=0 if i<ji<j.

The transpose of ACm×n\mathbf{A}\in\mathbb{C}^{m\times n} is the matrix ATCn×m\mathbf{A}^T\in\mathbb{C}^{n\times m} given by

AT=[A11A21Am1A1nA2nAmn]. \mathbf{A}^T = \begin{bmatrix} A_{11} & A_{21} & \cdots & A_{m1}\\ \vdots & \vdots & & \vdots\\ A_{1n} & A_{2n} & \cdots & A_{mn} \end{bmatrix}.

The adjoint or hermitian of a matrix A\mathbf{A} is given by A=AT\mathbf{A}^*=\overline{\mathbf{A}^T}, where the bar denotes taking a complex conjugate elementwise.[1] If A\mathbf{A} is real, then A=AT\mathbf{A}^*=\mathbf{A}^T. A square matrix is symmetric if AT=A\mathbf{A}^T=\mathbf{A} and hermitian if A=A\mathbf{A}^*=\mathbf{A}.

2Algebra

Matrices and vectors of the same size may be added elementwise. Multiplication by a scalar is also defined elementwise. These operations obey the familiar laws of commutativity, associativity, and distributivity. The multiplication of two matrices, on the other hand, is less straightforward.

There are two ways for vectors to be multiplied together. If v\mathbf{v} and w\mathbf{w} are in Cn\mathbb{C}^n, their inner product is

vw=k=1nvkwk. \mathbf{v}^* \mathbf{w} = \sum_{k=1}^n \overline{v_k}\, w_k.

Trivially, one finds that wv=(vw)\mathbf{w}^* \mathbf{v} = \overline{(\mathbf{v}^*\mathbf{w})}.

Additionally, any two vectors vCm\mathbf{v}\in\mathbb{C}^m and wCn\mathbf{w}\in\mathbb{C}^n (with mnm\neq n allowed) have an outer product, which is an m×nm\times n matrix:

vw=[viwj]i=1,,m,j=1,,n=[v1w1v1w2v1wnv2w1v2w2v2wnvmw1vmw2vmwn]. \mathbf{v} \mathbf{w}^* = \bigl[ v_i \overline{w_j} \bigr]_{\,i=1,\ldots,m,\, j=1,\ldots,n } = \begin{bmatrix} v_1\,\overline{w_1} & v_1\,\overline{w_2} & \cdots & v_1\,\overline{w_n}\\ v_2\,\overline{w_1} & v_2\,\overline{w_2} & \cdots & v_2\,\overline{w_n}\\ \vdots & \vdots & & \vdots\\ v_m\,\overline{w_1} & v_m\,\overline{w_2} & \cdots & v_m\,\overline{w_n} \end{bmatrix}.

For real vectors, the complex conjugates above have no effect and {}^* becomes T{}^T.

In order for matrices A\mathbf{A} and B\mathbf{B} to be multiplied, it is necessary that their inner dimensions match. Thus, if A\mathbf{A} is m×pm\times p, then B\mathbf{B} must be p×np \times n. In terms of scalar components, the (i,j)(i,j) entry of C=AB\mathbf{C}=\mathbf{A}\mathbf{B} is given by

Cij=k=1pAikBkj. C_{ij} = \sum_{k=1}^p A_{ik} B_{kj}.

Note that even if AB\mathbf{A}\mathbf{B} is defined, BA\mathbf{B}\mathbf{A} may not be. Moreover, even when both products are defined, they may not equal each other.

Matrix multiplication is associative, however:

ABC=(AB)C=A(BC).\mathbf{A}\mathbf{B}\mathbf{C}=(\mathbf{A}\mathbf{B})\mathbf{C}=\mathbf{A}(\mathbf{B}\mathbf{C}).

Hence while we cannot change the ordering of the terms, we can change the order of the operations. This is a property that we will use repeatedly. We also note here the important identity

(AB)T=BTAT. (\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T.

Specifically, if either product is defined, then they both are defined and equal each other.

3Linear combinations

It is worth reinterpreting (6) at a vector level. If A\mathbf{A} has dimensions m×nm\times n, it can be multiplied on the right by an n×1n \times 1 column vector v\mathbf{v} to produce an m×1m \times 1 column vector Av\mathbf{A}\mathbf{v}, which satisfies

Av=[kA1kvkkA2kvkkAmkvk]=v1[A11A21Am1]+v2[A12A22Am2]++vn[A1nA2nAmn]=v1a1++vnan. \mathbf{A}\mathbf{v} = \begin{bmatrix} \displaystyle \sum_k A_{1k}v_k \\[2mm] \displaystyle \sum_k A_{2k}v_k \\ \vdots\\ \displaystyle \sum_k A_{mk}v_k \end{bmatrix} = v_1 \begin{bmatrix} A_{11}\\A_{21}\\\vdots\\A_{m1} \end{bmatrix} + v_2 \begin{bmatrix} A_{12}\\A_{22}\\\vdots\\A_{m2} \end{bmatrix} + \cdots + v_n \begin{bmatrix} A_{1n}\\A_{2n}\\\vdots\\A_{mn} \end{bmatrix} = v_1 \mathbf{a}_1 + \cdots + v_n \mathbf{a}_n.

We say that Av\mathbf{A}\mathbf{v} is a linear combination of the columns of A\mathbf{A}.

There is a similar interpretation of multiplying A\mathbf{A} on the left by a row vector. Keeping to our convention that boldface letters represent column vectors, we write, for vRm\mathbf{v}\in\mathbb{R}^m,

vTA=[kvkAk1kvkAk2kvkAkn]=v1[A11A1n]+v2[A21A2n]++vm[Am1Amn]. \begin{split} \mathbf{v}^T \mathbf{A} &= \begin{bmatrix} \displaystyle \sum_k v_k A_{k1} & \displaystyle \sum_k v_k A_{k2} & \cdots & \displaystyle \sum_k v_k A_{kn} \end{bmatrix} \\ & = v_1 \begin{bmatrix} A_{11} & \cdots & A_{1n} \end{bmatrix} + v_2 \begin{bmatrix} A_{21} & \cdots & A_{2n} \end{bmatrix} + \cdots + v_m \begin{bmatrix} A_{m1} & \cdots & A_{mn} \end{bmatrix}. \end{split}

These two observations extend to more general matrix-matrix multiplications. One can show that (assuming that A\mathbf{A} is m×pm\times p and B\mathbf{B} is p×np\times n)

AB=A[b1b2bn]=[Ab1Ab2Abn]. \mathbf{A}\mathbf{B} = \mathbf{A} \begin{bmatrix} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_n \end{bmatrix} = \begin{bmatrix} \mathbf{A}\mathbf{b}_1 & \mathbf{A}\mathbf{b}_2 & \cdots & A\mathbf{b}_n \end{bmatrix}.

Equivalently, if we write A\mathbf{A} in terms of rows, then

A=[w1Tw2TwmT]AB=[w1TBw2TBwmTB]. \mathbf{A} = \begin{bmatrix} \mathbf{w}_1^T \\[2mm] \mathbf{w}_2^T \\ \vdots \\ \mathbf{w}_m^T \end{bmatrix} \qquad \Longrightarrow \qquad \mathbf{A}\mathbf{B} = \begin{bmatrix} \mathbf{w}_1^T \mathbf{B} \\[2mm] \mathbf{w}_2^T \mathbf{B} \\ \vdots \\ \mathbf{w}_m^T \mathbf{B} \end{bmatrix}.

The representations of matrix multiplication are interchangeable; whichever one is most convenient at any moment can be used.

There is also an interpretation, presented in LU factorization, of matrix products in terms of vector outer products.

4Identity and inverse

The identity matrix of size nn, called I\mathbf{I} (or sometimes In\mathbf{I}_n), is a diagonal n×nn\times n matrix with every diagonal entry equal to one. As can be seen from (11) and (12), it satisfies AI=A\mathbf{A}\mathbf{I}=\mathbf{A} for ACm×n\mathbf{A}\in\mathbb{C}^{m\times n} and IB=B\mathbf{I}\mathbf{B}=\mathbf{B} for BCn×p\mathbf{B}\in\mathbb{C}^{n\times p}. It is therefore the matrix analog of the number 1, the multiplicative identity.

Note that a square matrix A\mathbf{A} can always be multiplied by itself to get a matrix of the same size. Hence we can define the integer powers A2=(A)(A)\mathbf{A}^2=(\mathbf{A})(\mathbf{A}), A3=(A2)A=(A)A2\mathbf{A}^3=(\mathbf{A}^2) \mathbf{A} = (\mathbf{A}) \mathbf{A}^2 (by associativity), and so on. By definition, A0=I\mathbf{A}^0=\mathbf{I}.

If A\mathbf{A} is an n×nn\times n matrix, then there may be at most one matrix Z\mathbf{Z} of the same size such that

ZA=AZ=I.\mathbf{Z}\mathbf{A} = \mathbf{A}\mathbf{Z} = \mathbf{I}.

If Z\mathbf{Z} exists, it is called the inverse of A\mathbf{A} and is written as A1\mathbf{A}^{-1}. In this situation we say that A\mathbf{A} is invertible.

The zero matrix has no inverse. For n>1n>1 there are also nonzero matrices that have no inverse. Such matrices are called singular. The properties “invertible” and “singular” are exclusive opposites; thus, nonsingular means invertible and noninvertible means singular.

5Linear systems

Given a square, n×nn\times n matrix A\mathbf{A} and nn-vectors x\mathbf{x} and b\mathbf{b}, the equation Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} is equivalent to

a11x1+a12x2++a1nxn=b1a21x1+a22x2++a2nxn=b2an1x1+an2x2++annxn=bn.\begin{split} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n &= b_1 \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n &= b_2 \\ \vdots \\ a_{n1}x_1 + a_{n2}x_2 + \cdots + a_{nn}x_n &= b_n. \end{split}

The following facts are usually proved in any elementary text on linear algebra.

6Exercises

  1. ✍ In racquetball, the winner of a rally serves the next rally. Generally, the server has an advantage. Suppose that when Ashley and Barbara are playing racquetball, Ashley wins 60% of the rallies she serves and Barbara wins 70% of the rallies she serves. If xR2\mathbf{x}\in\mathbb{R}^2 is such that x1x_1 is the probability that Ashley serves first and x2=1x1x_2=1-x_1 is the probability that Barbara serves first, define a matrix A\mathbf{A} such that Ax\mathbf{A}\mathbf{x} is a vector of the probabilities that Ashley and Barbara each serve the second rally. What is the meaning of A10x\mathbf{A}^{10}\mathbf{x}?

  2. ✍ Suppose we have lists of nn terms and mm documents. We can define an m×nm\times n matrix A\mathbf{A} such that Aij=1A_{ij}=1 if term jj appears in document ii, and Aij=0A_{ij}=0 otherwise. Now suppose that the term list is

    "numerical", "analysis", "more", "cool", "accounting"

    and that x=[11010]T\mathbf{x} = \begin{bmatrix} 1 & 1 & 0 & 1 & 0 \end{bmatrix}^T. Give an interpretation of the product Ax\mathbf{A}\mathbf{x}.

    only - Unknown Directive
    Entry $i$ of $A\mathbf{x}$ is the sum of $a_{ij}x_j$, so the result is the number of the terms 'numerical', 'analysis', and 'cool' that appear in document $i$. (Note: this is not the same as the  total number of occurrences of the words; each term is counted only once.)
  3. ✍ Let

    A=[0100000100000010].\mathbf{A} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}.

    Show that An=0\mathbf{A}^n=0 when n4n\ge 4.

  4. ✍ Find two matrices A\mathbf{A} and B\mathbf{B}, neither of which is the zero matrix, such that AB=0\mathbf{A}\mathbf{B}=\boldsymbol{0}.

  5. ✍ Prove that when AB\mathbf{A} \mathbf{B} is defined, BTAT\mathbf{B}^T\mathbf{A}^T is defined too, and use Equation (6) to show that (AB)T=BTAT(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T.

  6. ✍ Show that if A\mathbf{A} is invertible, then (AT)1=(A1)T(\mathbf{A}^T)^{-1}=(\mathbf{A}^{-1})^T. (This matrix is often just written as AT\mathbf{A}^{-T}.)

  7. ✍ Prove true, or give a counterexample: The product of upper triangular square matrices is upper triangular.

Footnotes
  1. The conjugate of a complex number is found by replacing all references to the imaginary unit ii by i-i. We do not use complex numbers until the second half of the book.