Answer: MLE of the mean μ
Prerequisites
- Multivariate Gaussian Distribution (PDF)
- Maximum Likelihood Estimation (MLE)
- Matrix Calculus
Step-by-Step Derivation
1. Write the Likelihood Function
The probability density function for a single sample xi∈Rd from a multivariate Gaussian is:
p(xi∣μ,Σ)=(2π)d/2∣Σ∣1/21exp(−21(xi−μ)TΣ−1(xi−μ))
Assuming the samples {x1,⋯,xN} are independent and identically distributed (i.i.d.), the likelihood function L(μ,Σ) is the product of individual probabilities:
L(μ,Σ)=i=1∏Np(xi∣μ,Σ)
2. Formulate the Log-Likelihood
To simplify the derivative, we take the natural logarithm of the likelihood function to get the log-likelihood ℓ(μ,Σ):
ℓ(μ,Σ)=logL(μ,Σ)=i=1∑Nlogp(xi∣μ,Σ)
ℓ(μ,Σ)=i=1∑N(−2dlog(2π)−21log∣Σ∣−21(xi−μ)TΣ−1(xi−μ))
Dropping the terms that do not depend on μ, the objective functioning relative to μ is:
J(μ)=−21i=1∑N(xi−μ)TΣ−1(xi−μ)
3. Expand the Quadratic Term
Let's expand the term (xi−μ)TΣ−1(xi−μ):
(xi−μ)TΣ−1(xi−μ)=xiTΣ−1xi−xiTΣ−1μ−μTΣ−1xi+μTΣ−1μ
Since Σ is symmetric (Σ=ΣT), its inverse Σ−1 is also symmetric. Thus, the inner product is a scalar and xiTΣ−1μ=(μTΣ−1xi)T=μTΣ−1xi:
(xi−μ)TΣ−1(xi−μ)=xiTΣ−1xi−2μTΣ−1xi+μTΣ−1μ
4. Compute the Derivative with respect to μ
Taking the partial derivative of J(μ) with respect to μ:
∂μ∂J(μ)=−21i=1∑N∂μ∂(xiTΣ−1xi−2(Σ−1xi)Tμ+μTΣ−1μ)
Using the hint identities:
- ∂μ∂xiTΣ−1xi=0 (constant w.r.t μ)
- ∂μ∂(−2(Σ−1xi)Tμ)=−2Σ−1xi
- ∂μ∂(μTΣ−1μ)=Σ−1μ+(Σ−1)Tμ=2Σ−1μ (since Σ−1 is symmetric)
Plugging these back into the sum:
∂μ∂ℓ=−21i=1∑N(−2Σ−1xi+2Σ−1μ)=i=1∑NΣ−1(xi−μ)
5. Set Derivative to Zero and Solve for μ^
To find the maximum, set the derivative equal to the zero vector:
i=1∑NΣ−1(xi−μ^)=0
Since Σ−1 is a constant matrix (and invertible), we can multiply both sides by Σ:
i=1∑N(xi−μ^)=0
i=1∑Nxi−Nμ^=0⟹Nμ^=i=1∑Nxi
μ^ML=N1i=1∑Nxi
This proves that the Maximum Likelihood Estimate for the mean is simply the sample mean.