Skip to main content

Answer

Prerequisites

  • Posterior Distribution formulas from (a)
  • Limits and Asymptotic Behavior

Step-by-Step Derivation

From part (a) and substituting Γ=αI\Gamma = \alpha I and Σ=σ2I\Sigma = \sigma^2 I: Posterior Covariance: Σ^θ=(1αI+1σ2ΦΦT)1\hat{\Sigma}_\theta = (\frac{1}{\alpha}I + \frac{1}{\sigma^2}\Phi\Phi^T)^{-1} Posterior Mean: μ^θ=1σ2Σ^θΦy\hat{\mu}_\theta = \frac{1}{\sigma^2} \hat{\Sigma}_\theta \Phi y

Let's analyze the limits:

  1. Case α\alpha \to \infty (Uninformative/Flat Prior):

    • As the prior variance approaches infinity, our prior belief becomes extremely weak (we are totally uncertain about θ\theta before seeing data).
    • 1α0\frac{1}{\alpha} \to 0.
    • Covariance: Σ^θ(1σ2ΦΦT)1=σ2(ΦΦT)1\hat{\Sigma}_\theta \to (\frac{1}{\sigma^2}\Phi\Phi^T)^{-1} = \sigma^2 (\Phi\Phi^T)^{-1}.
    • Mean: μ^θ1σ2[σ2(ΦΦT)1]Φy=(ΦΦT)1Φy\hat{\mu}_\theta \to \frac{1}{\sigma^2} [\sigma^2 (\Phi\Phi^T)^{-1}] \Phi y = (\Phi\Phi^T)^{-1}\Phi y.
    • Result: The posterior mean becomes the standard Ordinary Least Squares (MLE) estimate. The prior has no regularizing effect.
  2. Case α=0\alpha = 0 (Absolute Prior Configuration):

    • (Taking the limit as α0\alpha \to 0 from the positive side)
    • The precision 1α\frac{1}{\alpha} \to \infty. The prior becomes a Dirac delta function at zero.
    • Covariance: Σ^θ(I+1σ2ΦΦT)10\hat{\Sigma}_\theta \to (\infty I + \frac{1}{\sigma^2}\Phi\Phi^T)^{-1} \to 0.
    • Mean: μ^θ0()0\hat{\mu}_\theta \to 0 (\dots) \to 0.
    • Result: The data has no effect. We are absolutely certain that θ=0\theta = 0 regardless of the observations.
  3. Case σ20\sigma^2 \to 0 (Noise-free Observations):

    • As the observation noise goes to zero, we trust the data completely.
    • The term 1σ2ΦΦT\frac{1}{\sigma^2}\Phi\Phi^T dominates the 1αI\frac{1}{\alpha}I term. Let's rewrite using the Woodbury matrix identity or by factoring.
    • Actually, using the form from part (c): μ^θ=(ΦΦT+λI)1Φy\hat{\mu}_\theta = (\Phi\Phi^T + \lambda I)^{-1}\Phi y where λ=σ2α\lambda = \frac{\sigma^2}{\alpha}.
    • If σ20\sigma^2 \to 0, then λ0\lambda \to 0.
    • Mean: μ^θ(ΦΦT)1Φy\hat{\mu}_\theta \to (\Phi\Phi^T)^{-1}\Phi y (assuming ΦΦT\Phi\Phi^T is invertible). The model perfectly interpolates the training data.
    • Covariance: Σ^θ=(1αI+1σ2ΦΦT)1=σ2(σ2αI+ΦΦT)1\hat{\Sigma}_\theta = (\frac{1}{\alpha}I + \frac{1}{\sigma^2}\Phi\Phi^T)^{-1} = \sigma^2(\frac{\sigma^2}{\alpha}I + \Phi\Phi^T)^{-1}. As σ20\sigma^2 \to 0, Σ^θ0(ΦΦT)1=0\hat{\Sigma}_\theta \to 0 \cdot (\Phi\Phi^T)^{-1} = 0.
    • Result: We become perfectly certain about parameters that fit the data exactly (zero posterior uncertainty), provided the data can be perfectly fit.