Symmetry and definiteness - Fundamentals of Numerical Computation

As we saw in Exploiting matrix structure, symmetry can simplify the LU factorization into the symmetric form $\mathbf{A}=\mathbf{L}\mathbf{D}\mathbf{L}^T$ . Important specializations occur as well for the eigenvalue and singular value factorizations. In this section we stay with complex-valued matrices, so we are interested in the case when $\mathbf{A}^*=\mathbf{A}$ , i.e., $\mathbf{A}$ is hermitian. However, we often loosely speak of symmetry to mean this property even in the complex case. All of the statements in this section easily specialize to the real case.

7.4.1Normality¶

Suppose now that $\mathbf{A}^*=\mathbf{A}$ and that $\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*$ is an SVD. Since $\mathbf{S}$ is real and square, we have

\mathbf{A}^* = \mathbf{V} \mathbf{S}^* \mathbf{U}^* = \mathbf{V} \mathbf{S} \mathbf{U}^*,

(7.4.1)

and it’s tempting to conclude that $\mathbf{U}=\mathbf{V}$ . Happily, this is nearly true. The following theorem is typically proved in an advanced linear algebra course.

Another way to state the result of this theorem is that a hermitian matrix has real eigenvalues and a complete set of orthonormal eigenvectors—that is, it is normal. Because hermitian matrices are normal, their eigenvalue condition number is guaranteed to be 1 by Theorem 7.2.3.

The converse of Theorem 7.4.1 is also true: every normal matrix with real eigenvalues is hermitian. This was illustrated in Demo 7.2.3.

For a hermitian matrix, the EVD

\mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^{-1}=\mathbf{V} \mathbf{D} \mathbf{V}^*

(7.4.2)

is almost an SVD.

7.4.2Rayleigh quotient¶

Recall that for a matrix $\mathbf{A}$ and compatible vector $\mathbf{x}$ , the quadratic form $\mathbf{x}^* \mathbf{A} \mathbf{x}$ is a scalar.

If $\mathbf{v}$ is an eigenvector such that $\mathbf{A} \mathbf{v}=\lambda \mathbf{v}$ , then one easily calculates that $R_{\mathbf{A}}(\mathbf{v})=\lambda.$ That is, the Rayleigh quotient maps an eigenvector into its associated eigenvalue.

If $\mathbf{A}^*=\mathbf{A}$ , then the Rayleigh quotient has another interesting property: $\nabla R_{\mathbf{A}}(\mathbf{v})=\boldsymbol{0}$ if $\mathbf{v}$ is an eigenvector. By a multidimensional Taylor series, then,

R_{\mathbf{A}}(\mathbf{v}+\epsilon\mathbf{z}) = R_{\mathbf{A}}(\mathbf{v}) + 0 + O( \epsilon^2) = \lambda + O( \epsilon^2),

(7.4.6)

as $\epsilon\to 0$ . The conclusion is that a good estimate of an eigenvector becomes an even better estimate of an eigenvalue.

Example 7.4.1 (Rayleigh quotient)

Julia

MATLAB

Python

Example 7.4.1

We will use a symmetric matrix with a known EVD and eigenvalues equal to the integers from 1 to 20.

n = 20;
λ = 1:n
D = diagm(λ)
V, _ = qr(randn(n, n))   # get a random orthogonal V
A = V * D * V';

The Rayleigh quotient is a scalar-valued function of a vector.

R = x -> (x' * A * x) / (x' * x);

The Rayleigh quotient evaluated at an eigenvector gives the corresponding eigenvalue.

R(V[:, 7])

7.000000000000001

If the input to he Rayleigh quotient is within a small δ of an eigenvector, its output is within $O(\delta^2)$ of the corresponding eigenvalue. In this experiment, we observe that each additional digit of accuracy in an approximate eigenvector gives two more digits to the eigenvalue estimate coming from the Rayleigh quotient.

δ = @. 1 ./ 10^(1:5)
eval_diff = zeros(size(δ))
for (k, delta) in enumerate(δ)
    e = randn(n)
    e = delta * e / norm(e)
    x = V[:, 7] + e
    eval_diff[k] = R(x) - 7
end
labels = ["perturbation δ", "δ²", "R(x) - λ"]
@pt :header=labels [δ δ .^ 2 eval_diff]

Example 7.4.1

We will use a symmetric matrix with a known EVD and eigenvalues equal to the integers from 1 to 20.

n = 20;
lambda = 1:n;
D = diag(lambda);
[V, ~] = qr(randn(n, n));    % get a random orthogonal V
A = V * D * V';

The Rayleigh quotient is a scalar-valued function of a vector.

R = @(x) (x' * A * x) / (x' * x);

The Rayleigh quotient evaluated at an eigenvector gives the corresponding eigenvalue.

format long
R(V(:, 7))

delta = 1 ./ 10 .^ (1:5)';
dif = zeros(size(delta));
for k = 1:length(delta)
    e = randn(n, 1);
    e = delta(k) * e / norm(e);
    x = V(:, 6) + e;
    dif(k) = R(x) - lambda(6);
end
disp(table(delta, dif, variablenames=["perturbation size", "R(x) - lambda"]))

    perturbation size       R(x) - lambda

    _________________    ____________________

            0.1            0.0452326077930536

           0.01          0.000323511790824682

          0.001          7.21675393666743e-06

         0.0001          2.80162018029273e-08

          1e-05          4.27677449010844e-10

Example 7.4.1

We will use a symmetric matrix with a known EVD and eigenvalues equal to the integers from 1 to 20.

from numpy.linalg import qr
n = 20
d = arange(n) + 1
D = diag(d)
V, _ = qr(random.randn(n, n))    # get a random orthogonal V
A = V @ D @ V.T

The Rayleigh quotient is a scalar-valued function of a vector.

R = lambda x: dot(x, A @ x) / dot(x, x)

The Rayleigh quotient evaluated at an eigenvector gives the corresponding eigenvalue.

print(R(V[:, 6]))

7.0

results = PrettyTable(["perturbation size", "R.Q. - λ"])
for delta in 1 / 10 ** arange(1, 6):
    e = random.randn(n)
    e = delta * e / norm(e)
    x = V[:, 5] + e
    quotient = R(x)
    results.add_row([delta, quotient - d[5]])

print(results)

+-------------------+------------------------+
| perturbation size |        R.Q. - λ        |
+-------------------+------------------------+
|        0.1        |  0.035247810976065885  |
|        0.01       | 0.0003015262874548341  |
|       0.001       | 5.558704769370593e-06  |
|       0.0001      | 4.121216701236108e-08  |
|       1e-05       | 3.9184655520330125e-10 |
+-------------------+------------------------+

7.4.3Definite, semidefinite, and indefinite matrices¶

In the real case, we called a symmetric matrix $\mathbf{A}$ symmetric positive definite (SPD) if $\mathbf{x}^T \mathbf{A}\mathbf{x} > 0$ for all nonzero vectors $\mathbf{x}$ . In the complex case the relevant property is hermitian positive definite (HPD), meaning that $\mathbf{A}^*=\mathbf{A}$ and $\mathbf{x}^* \mathbf{A}\mathbf{x} > 0$ for all complex vectors $\mathbf{x}$ . Putting this property together with the Rayleigh quotient leads to the following.

According to Theorem 7.4.3, for an HPD matrix, the EVD $\mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^*$ meets all the requirements of the SVD, provided the ordering of eigenvalues is chosen appropriately.

A hermitian matrix with all negative eigenvalues is called negative definite, and one with eigenvalues of different signs is indefinite. Finally, if one or more eigenvalues is zero and the rest have one sign, it is positive or negative semidefinite.

7.4.4Exercises¶

✍ Each line below is an EVD for a hermitian matrix. State whether the matrix is definite, indefinite, or semidefinite. Then state whether the given factorization is also an SVD, and if it is not, modify it to find an SVD.
(a) $\begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} -1 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$

(b) $\begin{bmatrix} 4 & -2 \\ -2 & 1 \end{bmatrix} = \begin{bmatrix} 1 & -0.5 \\ -0.5 & -1 \end{bmatrix} \begin{bmatrix} 5 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0.8 & -0.4 \\ -0.4 & -0.8 \end{bmatrix}$

(c) $\begin{bmatrix} -5 & 3\\ 3 & -5 \end{bmatrix} = \begin{bmatrix} \alpha & \alpha \\ \alpha & -\alpha \end{bmatrix} \begin{bmatrix} -2 & 0 \\ 0 & -8 \end{bmatrix} \begin{bmatrix} \alpha & \alpha \\ \alpha & -\alpha \end{bmatrix}, \quad\alpha=1/\sqrt{2}$
⌨ The matrix names below are found in MatrixDepot for Julia, gallery for MATLAB, and rogues for Python. You will have to adjust the syntax accordingly. For each matrix, determine whether it is positive definite, negative definite, positive or negative semidefinite, or indefinite.
(a) pei(5) $- 6 \mathbf{I}$
(b) hilb(8) $- 2 \mathbf{I}$
(c) dingdong(20)
(d) lehmer(100)
(e) fiedler(200)
✍ Prove true, or give a counterexample: If $\mathbf{A}$ and $\mathbf{B}$ are hermitian matrices of the same size, then
$R_{\mathbf{A}+\mathbf{B}}(\mathbf{x}) = R_{\mathbf{A}}(\mathbf{x})+R_{\mathbf{B}}(\mathbf{x}).$
(7.4.9)
⌨ The range of the function $R_{\mathbf{A}}(\mathbf{x})$ is a subset of the complex plane known as the field of values of the matrix $\mathbf{A}$ . Use 500 random vectors to plot points in the field of values of $\mathbf{A} = \displaystyle \begin{bmatrix} 1 & 0 & -2\\ 0 & 2 & 0\\ -2 & 0 & 1 \end{bmatrix}$ . Then compute its eigenvalues and guess what the exact field of values is.
✍ Let $\mathbf{A}=\displaystyle \begin{bmatrix} 3 & -2 \\ -2 & 0 \end{bmatrix}.$
(a) Write out $R_{\mathbf{A}}(\mathbf{x})$ explicitly as a function of $x_1$ and $x_2$ .
(b) Find $R_{\mathbf{A}}(\mathbf{x})$ for $x_1=1$ , $x_2=2$ .
(c) Find the gradient vector $\nabla R_{\mathbf{A}}(\mathbf{x})$ .
(d) Show that the gradient vector is zero when $x_1=1$ , $x_2=2$ .
✍ A skew-Hermitian matrix is one that satisfies $\mathbf{A}^*=-\mathbf{A}$ . Show that if $\mathbf{A}$ is skew-Hermitian, then $R_{\mathbf{A}}$ is imaginary-valued.
⌨ Thanks largely to Theorem 7.4.1, the eigenvalue problem for symmetric/hermitian matrices is easier than for general matrices.
(a) Let $\mathbf{A}$ be a $1000\times 1000$ random real matrix, and let $\mathbf{S}=\mathbf{A}+\mathbf{A}^T$ . Time finding the eigenvalues of $\mathbf{A}$ and then of $\mathbf{S}$ . You should find that the computation for $\mathbf{S}$ is around an order of magnitude faster.
(b) Perform the experiment from part (a) on $n\times n$ matrices for $n=200,300,\ldots,1600$ . Plot running time as a function of $n$ for both matrices on a single log-log plot. Is the ratio of running times roughly constant, or does it grow with $n$ ?

Preface

Singular value decomposition

Preface

Dimension reduction