Explanation: MLE for the Mean
Intuition
If someone gives you a bunch of data points (say, the height and weight of several people) and asks you, "What's the most likely center of this data?" your immediate reaction would naturally be to calculate the average.
This problem shows mathematically that for data following a Multivariate Gaussian (Normal) distribution, the Maximum Likelihood Estimate (MLE) of the center (mean vector ) is exactly what common sense dictates: the sample mean.
The Likelihood Landscape
Imagine the data points are scattered in a 2D space. The Gaussian distribution forms a bell-shaped curve over this space.
- We want to find the exact position of the "peak" (the mean ) of this bell curve such that the probability of having observed our given data points is maximized.
- The term is a distance measure (called Mahalanobis distance) between a data point and the mean , scaled by how stretched the bell curve is ().
To maximize the probability, we must minimize these distances.
graph TD
A[Data Points] -->|Calculate Mahalanobis Dist| B(Min distance to peak)
B -->|Take Derivative & Set to 0| C{Peak Location}
C -->|Equals| D[Sample Mean]
Why it Makes Sense
When we set the derivative to zero: The scales all dimensions but ultimately acts like a constant weight that doesn't change where the zero-point lies. The total "pull" or "error" from all points must balance out to exactly zero. The only point that perfectly balances the forces from all existing data points is the center of mass—which is the arithmetic average: .
Common Pitfalls
- Matrix derivative rules: Unlike scalar calculus where becomes , the derivative of a quadratic form gives . Forgetting that must be symmetric to simply call it is a frequent mistake. Here, we correctly relied on being symmetric.