Explain
Detailed Explanation
The MAP estimate is the value of that maximizes the posterior probability. Depending on the assumptions, it relates to other regression methods:
1. The Formula
The derived MAP estimate is:
2. Connection to Least Squares
- Ordinary Least Squares (OLS): Minimizes squared error terms. It assumes every data point is equally important and there is no prior belief.
- Weighted Least Squares (WLS): Minimizes weighted squared errors. It uses to give less weight to noisy observations. It corresponds to Maximum Likelihood with non-i.i.d. noise.
The MAP estimate looks like WLS but with an extra term added to the matrix inversion: .
3. The Role of (Regularization)
The term is the inverse of the prior covariance. It quantifies how "sure" we are that is close to 0 before seeing data.
- If elements of are large (large variance), is small. The prior is weak, and MAP ML.
- If elements of are small (small variance), is large. The prior is strong, and MAP is pulled heavily towards 0 (or the prior mean).
4. Advantages
The main advantage of the Bayesian/MAP approach (nonzero ) is handling ill-posed problems:
- Invertibility: In standard regression, if you have 10 data points and 100 features, is not invertible. You cannot solve it uniquely. By adding (like adding in Ridge Regression), the matrix becomes invertible, and a unique solution exists.
- Overfitting: Standard LS tries to fit the training noise perfectly. MAP penalizes complex models (large weights), leading to better generalization on unseen data.