Skip to main content

Answer.md

Pre-required Knowledge

  1. Log-Likelihood Function: From part (a), the log-likelihood is: (μ,Σ)=Nd2log(2π)N2logΣ12i=1N(xiμ)TΣ1(xiμ)\ell(\mu, \Sigma) = -\frac{Nd}{2}\log(2\pi) - \frac{N}{2}\log|\Sigma| - \frac{1}{2} \sum_{i=1}^N (x_i - \mu)^T \Sigma^{-1} (x_i - \mu)
  2. Trace Method: The scalar value aTBaa^T B a is equivalent to tr(aTBa)=tr(BaaT)\text{tr}(a^T B a) = \text{tr}(B a a^T). This is useful for moving vectors inside the summation into a matrix form.
  3. Matrix Derivatives (Given):
    • XlogX=XT\frac{\partial}{\partial X} \log |X| = X^{-T}. Since Σ\Sigma is symmetric, ΣlogΣ=Σ1\frac{\partial}{\partial \Sigma} \log |\Sigma| = \Sigma^{-1}.
    • Xtr(X1A)=(XTATXT)\frac{\partial}{\partial X} \text{tr}(X^{-1}A) = -(X^{-T} A^T X^{-T}). Since Σ\Sigma and AA (which will be the scatter matrix) are symmetric, this simplifies to Σ1AΣ1-\Sigma^{-1} A \Sigma^{-1}.

Step-by-Step Answer

  1. Substitute the ML estimate of μ\mu: We substitute μ\mu with μ^ML=1Ni=1Nxi\hat{\mu}_{ML} = \frac{1}{N} \sum_{i=1}^N x_i. Let S=i=1N(xiμ^)(xiμ^)TS = \sum_{i=1}^N (x_i - \hat{\mu})(x_i - \hat{\mu})^T be the scatter matrix.

  2. Rewrite the Log-Likelihood using Trace: The term in the summation is a scalar: (xiμ)TΣ1(xiμ)=tr((xiμ)TΣ1(xiμ))=tr(Σ1(xiμ)(xiμ)T)(x_i - \mu)^T \Sigma^{-1} (x_i - \mu) = \text{tr}\left( (x_i - \mu)^T \Sigma^{-1} (x_i - \mu) \right) = \text{tr}\left( \Sigma^{-1} (x_i - \mu)(x_i - \mu)^T \right) Summing over ii: i=1N(xiμ)TΣ1(xiμ)=tr(Σ1i=1N(xiμ)(xiμ)T)=tr(Σ1S)\sum_{i=1}^N (x_i - \mu)^T \Sigma^{-1} (x_i - \mu) = \text{tr}\left( \Sigma^{-1} \sum_{i=1}^N (x_i - \mu)(x_i - \mu)^T \right) = \text{tr}(\Sigma^{-1} S)

    So the relevant part of the log-likelihood (ignoring constants) is: (Σ)N2logΣ12tr(Σ1S)\ell(\Sigma) \propto - \frac{N}{2}\log|\Sigma| - \frac{1}{2}\text{tr}(\Sigma^{-1} S)

  3. Differentiate with respect to Σ\Sigma: Using the provided identities:

    • ΣlogΣ=ΣT=Σ1\frac{\partial}{\partial \Sigma} \log|\Sigma| = \Sigma^{-T} = \Sigma^{-1} (since symmetric).
    • Σtr(Σ1S)=(ΣTSTΣT)\frac{\partial}{\partial \Sigma} \text{tr}(\Sigma^{-1} S) = -(\Sigma^{-T} S^T \Sigma^{-T}). Since SS and Σ\Sigma are symmetric, this is Σ1SΣ1-\Sigma^{-1} S \Sigma^{-1}.
    Σ=N2Σ112(Σ1SΣ1)=N2Σ1+12Σ1SΣ1\frac{\partial \ell}{\partial \Sigma} = -\frac{N}{2} \Sigma^{-1} - \frac{1}{2} (-\Sigma^{-1} S \Sigma^{-1}) = -\frac{N}{2} \Sigma^{-1} + \frac{1}{2} \Sigma^{-1} S \Sigma^{-1}
  4. Set derivative to zero and solve:

    N2Σ1+12Σ1SΣ1=0Σ1SΣ1=NΣ1\begin{aligned} -\frac{N}{2} \Sigma^{-1} + \frac{1}{2} \Sigma^{-1} S \Sigma^{-1} &= 0 \\ \Sigma^{-1} S \Sigma^{-1} &= N \Sigma^{-1} \end{aligned}

    Multiply by Σ\Sigma on the left and right:

    Σ(Σ1SΣ1)Σ=Σ(NΣ1)ΣS=NΣΣ=1NS\begin{aligned} \Sigma (\Sigma^{-1} S \Sigma^{-1}) \Sigma &= \Sigma (N \Sigma^{-1}) \Sigma \\ S &= N \Sigma \\ \Sigma &= \frac{1}{N} S \end{aligned}

    So, Σ^ML=1Ni=1N(xiμ^)(xiμ^)T\hat{\Sigma}_{ML} = \frac{1}{N} \sum_{i=1}^N (x_i - \hat{\mu})(x_i - \hat{\mu})^T