Skip to main content

Answer: MLE of the covariance Σ\Sigma

Prerequisites

  • Multivariate Gaussian Distribution (PDF)
  • Maximum Likelihood Estimation (MLE)
  • Trace Trick in Matrix Algebra: xTAx=tr(xTAx)=tr(AxxT)x^T A x = \text{tr}(x^T A x) = \text{tr}(A x x^T)
  • Matrix Calculus

Step-by-Step Derivation

1. Recall the Log-Likelihood Function From the derivation in part (a), the log-likelihood function for NN i.i.d. samples from a multivariate Gaussian is:

(μ,Σ)=i=1N(d2log(2π)12logΣ12(xiμ)TΣ1(xiμ))\ell(\mu, \Sigma) = \sum_{i=1}^N \left( -\frac{d}{2} \log(2\pi) - \frac{1}{2} \log |\Sigma| - \frac{1}{2} (x_i - \mu)^T \Sigma^{-1} (x_i - \mu) \right)

We want to maximize this with respect to Σ\Sigma. Extracting only the terms containing Σ\Sigma:

J(Σ)=N2logΣ12i=1N(xiμ)TΣ1(xiμ)J(\Sigma) = -\frac{N}{2} \log |\Sigma| - \frac{1}{2} \sum_{i=1}^N (x_i - \mu)^T \Sigma^{-1} (x_i - \mu)

2. Apply the Trace Trick The term (xiμ)TΣ1(xiμ)(x_i - \mu)^T \Sigma^{-1} (x_i - \mu) is a scalar. The trace of a scalar is just the scalar itself. By the cyclic property of the trace operation, tr(ABC)=tr(CAB)=tr(BCA)\text{tr}(ABC) = \text{tr}(CAB) = \text{tr}(BCA), we can reorder the factors:

(xiμ)TΣ1(xiμ)=tr((xiμ)TΣ1(xiμ))=tr((xiμ)(xiμ)TΣ1)(x_i - \mu)^T \Sigma^{-1} (x_i - \mu) = \text{tr} \left( (x_i - \mu)^T \Sigma^{-1} (x_i - \mu) \right) = \text{tr} \left( (x_i - \mu)(x_i - \mu)^T \Sigma^{-1} \right)

Now, substitute this back into J(Σ)J(\Sigma) and swap the sum and the trace (since trace is a linear operator):

J(Σ)=N2logΣ12tr(i=1N(xiμ)(xiμ)TΣ1)J(\Sigma) = -\frac{N}{2} \log |\Sigma| - \frac{1}{2} \text{tr} \left( \sum_{i=1}^N (x_i - \mu)(x_i - \mu)^T \Sigma^{-1} \right)

Let's define the scatter matrix S=i=1N(xiμ)(xiμ)TS = \sum_{i=1}^N (x_i - \mu)(x_i - \mu)^T. Note that SS is a d×dd \times d symmetric matrix. The objective simplifies to:

J(Σ)=N2logΣ12tr(SΣ1)J(\Sigma) = -\frac{N}{2} \log |\Sigma| - \frac{1}{2} \text{tr} (S \Sigma^{-1})

3. Compute the Matrix Derivative Now we take the partial derivative of J(Σ)J(\Sigma) with respect to the matrix Σ\Sigma. Using the provided hints:

  • ΣlogΣ=ΣT\frac{\partial}{\partial \Sigma} \log |\Sigma| = \Sigma^{-T}
  • Σtr(SΣ1)=(ΣTSTΣT)\frac{\partial}{\partial \Sigma} \text{tr}(S \Sigma^{-1}) = - (\Sigma^{-T} S^T \Sigma^{-T})

Applying these rules:

ΣJ(Σ)=N2ΣT12((ΣTSTΣT))\frac{\partial}{\partial \Sigma} J(\Sigma) = -\frac{N}{2} \Sigma^{-T} - \frac{1}{2} \left( - (\Sigma^{-T} S^T \Sigma^{-T}) \right)

Since Σ\Sigma is symmetric, ΣT=Σ\Sigma^T = \Sigma, so ΣT=(Σ1)T=Σ1\Sigma^{-T} = (\Sigma^{-1})^T = \Sigma^{-1}. Since SS is a sum of exterior products (xiμ)(xiμ)T(x_i-\mu)(x_i-\mu)^T, SS is also symmetric (ST=SS^T = S). Thus, the equation simplifies to:

Σ=N2Σ1+12Σ1SΣ1\frac{\partial \ell}{\partial \Sigma} = -\frac{N}{2} \Sigma^{-1} + \frac{1}{2} \Sigma^{-1} S \Sigma^{-1}

4. Set to Zero and Solve for Σ^\hat{\Sigma} Set the derivative equal to the zero matrix:

N2Σ1+12Σ1SΣ1=0-\frac{N}{2} \Sigma^{-1} + \frac{1}{2} \Sigma^{-1} S \Sigma^{-1} = 0 N2Σ1=12Σ1SΣ1\frac{N}{2} \Sigma^{-1} = \frac{1}{2} \Sigma^{-1} S \Sigma^{-1}

Multiply both sides from the left by Σ\Sigma and then multiply from the right by Σ\Sigma:

NΣΣ1Σ=ΣΣ1SΣ1ΣN \Sigma \Sigma^{-1} \Sigma = \Sigma \Sigma^{-1} S \Sigma^{-1} \Sigma NΣ=SN \Sigma = S

Solving for the estimate Σ^\hat{\Sigma}:

Σ^ML=1NS=1Ni=1N(xiμ)(xiμ)T\hat{\Sigma}_{ML} = \frac{1}{N} S = \frac{1}{N} \sum_{i=1}^N (x_i - \mu)(x_i - \mu)^T

Assuming we replace the true mean μ\mu with its ML estimate μ^\hat{\mu} derived in part (a), the final ML estimate for the covariance matrix is:

Σ^ML=1Ni=1N(xiμ^)(xiμ^)T\hat{\Sigma}_{ML} = \frac{1}{N} \sum_{i=1}^N (x_i - \hat{\mu})(x_i - \hat{\mu})^T