Prerequisites
- Posterior Distribution formulas from (a)
- Limits and Asymptotic Behavior
Step-by-Step Derivation
From part (a) and substituting Γ=αI and Σ=σ2I:
Posterior Covariance: Σ^θ=(α1I+σ21ΦΦT)−1
Posterior Mean: μ^θ=σ21Σ^θΦy
Let's analyze the limits:
-
Case α→∞ (Uninformative/Flat Prior):
- As the prior variance approaches infinity, our prior belief becomes extremely weak (we are totally uncertain about θ before seeing data).
- α1→0.
- Covariance: Σ^θ→(σ21ΦΦT)−1=σ2(ΦΦT)−1.
- Mean: μ^θ→σ21[σ2(ΦΦT)−1]Φy=(ΦΦT)−1Φy.
- Result: The posterior mean becomes the standard Ordinary Least Squares (MLE) estimate. The prior has no regularizing effect.
-
Case α=0 (Absolute Prior Configuration):
- (Taking the limit as α→0 from the positive side)
- The precision α1→∞. The prior becomes a Dirac delta function at zero.
- Covariance: Σ^θ→(∞I+σ21ΦΦT)−1→0.
- Mean: μ^θ→0(…)→0.
- Result: The data has no effect. We are absolutely certain that θ=0 regardless of the observations.
-
Case σ2→0 (Noise-free Observations):
- As the observation noise goes to zero, we trust the data completely.
- The term σ21ΦΦT dominates the α1I term. Let's rewrite using the Woodbury matrix identity or by factoring.
- Actually, using the form from part (c): μ^θ=(ΦΦT+λI)−1Φy where λ=ασ2.
- If σ2→0, then λ→0.
- Mean: μ^θ→(ΦΦT)−1Φy (assuming ΦΦT is invertible). The model perfectly interpolates the training data.
- Covariance: Σ^θ=(α1I+σ21ΦΦT)−1=σ2(ασ2I+ΦΦT)−1. As σ2→0, Σ^θ→0⋅(ΦΦT)−1=0.
- Result: We become perfectly certain about parameters that fit the data exactly (zero posterior uncertainty), provided the data can be perfectly fit.