Answer
Pre-required Knowledge
- MAP Estimation: Maximum A Posteriori estimation finds the mode of the posterior distribution.
- Mode of Gaussian: For a Gaussian distribution , the mode is equal to the mean .
- Least Squares Solution: (in the notation of this problem).
- Weighted Least Squares: .
Step-by-Step Answer
-
Determine : Since the posterior is Gaussian with mean , the maximum of the density occurs exactly at the mean. Thus:
-
Comparison to Weighted Least Squares (WLS): The WLS estimate arises from Maximum Likelihood estimation when counting for observation covariance :
Difference: The MAP estimate has an extra term added to the "scatter matrix" inside the inverse.
-
Comparison to Ordinary Least Squares (OLS): OLS assumes (i.i.d noise, constant variance) and no prior.
Difference: MAP includes the covariance structure of the noise (handling heteroscedasticity or correlated noise) AND the prior precision .
-
Role of the New Terms: The term represents the prior precision (inverse covariance).
- It acts as a regularizer.
- It "pulls" the estimate towards the prior mean (which is 0 in this problem).
- Mathematically, it effectively adds positive values to the diagonal (or eigenvalues) of the matrix being inverted.
-
Advantage: Yes, there is a significant advantage in setting to something non-zero (i.e., using a prior):
- Regularization: It prevents overfitting. When data is scarce or noisy, the prior constrains the parameters from exploding.
- Numerical Stability: If is singular or ill-conditioned (e.g., fewer data points than features, or collinear features), the inverse would not exist or be unstable for ML/LS. Adding (which is positive definite) makes the matrix invertible ( always exists).