Skip to main content

Explanation for 1.6 (d) - Covariance Matrix Eigendecomposition

Based on the derivation in Problem 1.6 (d), here is a detailed breakdown of how we get from eigenvalues to the full matrix decomposition Σ=VΛVT\Sigma = V \Lambda V^T.

1. From Vectors to Matrices

The most common confusing part is moving from the single vector equation to the full matrix equation.

We start with the definition of an eigenvector viv_i and eigenvalue λi\lambda_i: Σvi=λivi\Sigma v_i = \lambda_i v_i

If we have dd eigenvectors (v1,...,vdv_1, ..., v_d), we can write them all side-by-side using matrix multiplication.

Left-hand side (ΣV\Sigma V): When you multiply a matrix Σ\Sigma by a matrix VV (where VV is made of columns v1,...,vdv_1, ..., v_d), the result is simply Σ\Sigma multiplying each column individually:

ΣV=Σ[v1vd]=[Σv1Σvd]\Sigma V = \Sigma \begin{bmatrix} | & & | \\ v_1 & \dots & v_d \\ | & & | \end{bmatrix} = \begin{bmatrix} | & & | \\ \Sigma v_1 & \dots & \Sigma v_d \\ | & & | \end{bmatrix}

Since Σvi=λivi\Sigma v_i = \lambda_i v_i, we can replace the columns:

=[λ1v1λdvd]= \begin{bmatrix} | & & | \\ \lambda_1 v_1 & \dots & \lambda_d v_d \\ | & & | \end{bmatrix}

Right-hand side (VΛV \Lambda): Now look at VΛV \Lambda. If you multiply a matrix VV on the right by a diagonal matrix Λ\Lambda, it scales each column of VV by the corresponding diagonal element:

[v1vd][λ100λd]=[λ1v1λdvd]\begin{bmatrix} | & & | \\ v_1 & \dots & v_d \\ | & & | \end{bmatrix} \begin{bmatrix} \lambda_1 & & 0 \\ & \ddots & \\ 0 & & \lambda_d \end{bmatrix} = \begin{bmatrix} | & & | \\ \lambda_1 v_1 & \dots & \lambda_d v_d \\ | & & | \end{bmatrix}

Conclusion: Since the columns match exactly, we have proven: ΣV=VΛ\Sigma V = V \Lambda

2. Why VV is Orthogonal (VT=V1V^T = V^{-1})

The problem states that Σ\Sigma is a covariance matrix.

  • Covariance matrices are always symmetric (Σ=ΣT\Sigma = \Sigma^T).
  • A key theorem in linear algebra (The Spectral Theorem) states that symmetric matrices always have orthogonal eigenvectors.

This means the dot product of any two different eigenvectors is 0, and we normalize them so their length is 1: viTvj=0 (if ij),viTvi=1v_i^T v_j = 0 \text{ (if } i \neq j), \quad v_i^T v_i = 1

In matrix form, calculating VTVV^T V:

VTV=[v1TvdT][v1vd]=[1001]=IV^T V = \begin{bmatrix} - v_1^T - \\ \vdots \\ - v_d^T - \end{bmatrix} \begin{bmatrix} | & & | \\ v_1 & \dots & v_d \\ | & & | \end{bmatrix} = \begin{bmatrix} 1 & 0 & \dots \\ 0 & 1 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} = I

Since VTV=IV^T V = I, by definition VTV^T is the inverse of VV.

3. Solving for Σ\Sigma

Now we just rearrange the algebra:

  1. Start with: ΣV=VΛ\Sigma V = V \Lambda
  2. Multiply both sides by VTV^T (from the right): ΣVVT=VΛVT\Sigma V V^T = V \Lambda V^T
  3. Since VVT=IV V^T = I, the VV's on the left cancel out: Σ=VΛVT\Sigma = V \Lambda V^T

Geometric Interpretation

This equation tells us that any covariance matrix can be thought of as:

  1. Rotating the space to align with the data's axes (VTV^T).
  2. Stretching along those axes based on the variance (Λ\Lambda).
  3. Rotating back to the original orientation (VV).