Answer: MLE of the covariance Σ
Prerequisites
- Multivariate Gaussian Distribution (PDF)
- Maximum Likelihood Estimation (MLE)
- Trace Trick in Matrix Algebra: xTAx=tr(xTAx)=tr(AxxT)
- Matrix Calculus
Step-by-Step Derivation
1. Recall the Log-Likelihood Function
From the derivation in part (a), the log-likelihood function for N i.i.d. samples from a multivariate Gaussian is:
ℓ(μ,Σ)=i=1∑N(−2dlog(2π)−21log∣Σ∣−21(xi−μ)TΣ−1(xi−μ))
We want to maximize this with respect to Σ. Extracting only the terms containing Σ:
J(Σ)=−2Nlog∣Σ∣−21i=1∑N(xi−μ)TΣ−1(xi−μ)
2. Apply the Trace Trick
The term (xi−μ)TΣ−1(xi−μ) is a scalar. The trace of a scalar is just the scalar itself. By the cyclic property of the trace operation, tr(ABC)=tr(CAB)=tr(BCA), we can reorder the factors:
(xi−μ)TΣ−1(xi−μ)=tr((xi−μ)TΣ−1(xi−μ))=tr((xi−μ)(xi−μ)TΣ−1)
Now, substitute this back into J(Σ) and swap the sum and the trace (since trace is a linear operator):
J(Σ)=−2Nlog∣Σ∣−21tr(i=1∑N(xi−μ)(xi−μ)TΣ−1)
Let's define the scatter matrix S=∑i=1N(xi−μ)(xi−μ)T. Note that S is a d×d symmetric matrix. The objective simplifies to:
J(Σ)=−2Nlog∣Σ∣−21tr(SΣ−1)
3. Compute the Matrix Derivative
Now we take the partial derivative of J(Σ) with respect to the matrix Σ.
Using the provided hints:
- ∂Σ∂log∣Σ∣=Σ−T
- ∂Σ∂tr(SΣ−1)=−(Σ−TSTΣ−T)
Applying these rules:
∂Σ∂J(Σ)=−2NΣ−T−21(−(Σ−TSTΣ−T))
Since Σ is symmetric, ΣT=Σ, so Σ−T=(Σ−1)T=Σ−1.
Since S is a sum of exterior products (xi−μ)(xi−μ)T, S is also symmetric (ST=S).
Thus, the equation simplifies to:
∂Σ∂ℓ=−2NΣ−1+21Σ−1SΣ−1
4. Set to Zero and Solve for Σ^
Set the derivative equal to the zero matrix:
−2NΣ−1+21Σ−1SΣ−1=0
2NΣ−1=21Σ−1SΣ−1
Multiply both sides from the left by Σ and then multiply from the right by Σ:
NΣΣ−1Σ=ΣΣ−1SΣ−1Σ
NΣ=S
Solving for the estimate Σ^:
Σ^ML=N1S=N1i=1∑N(xi−μ)(xi−μ)T
Assuming we replace the true mean μ with its ML estimate μ^ derived in part (a), the final ML estimate for the covariance matrix is:
Σ^ML=N1i=1∑N(xi−μ^)(xi−μ^)T