Skip to main content

Answer

Pre-required Knowledge

  1. Limits involving matrices: Understanding how (A+kB)1(A + kB)^{-1} behaves when k0k \to 0 or kk \to \infty.
  2. Posterior Formulas: Σ^θ=(1αI+1σ2ΦΦT)1\hat{\Sigma}_\theta = (\frac{1}{\alpha} I + \frac{1}{\sigma^2} \Phi \Phi^T)^{-1} μ^θ=Σ^θ1σ2Φy\hat{\mu}_\theta = \hat{\Sigma}_\theta \frac{1}{\sigma^2} \Phi y

Step-by-Step Answer

  1. Case 1: α\alpha \to \infty (Infinite Prior Variance)

    • Meaning: The prior becomes "flat" or uninformative. We have no prior bias towards 0. 1α0\frac{1}{\alpha} \to 0.
    • Covariance: Σ^θ(0I+1σ2ΦΦT)1=σ2(ΦΦT)1\hat{\Sigma}_\theta \to (0 \cdot I + \frac{1}{\sigma^2} \Phi \Phi^T)^{-1} = \sigma^2 (\Phi \Phi^T)^{-1} This is the standard covariance estimate for OLS.
    • Mean: μ^θ[σ2(ΦΦT)1]1σ2Φy=(ΦΦT)1Φy\hat{\mu}_\theta \to [\sigma^2 (\Phi \Phi^T)^{-1}] \frac{1}{\sigma^2} \Phi y = (\Phi \Phi^T)^{-1} \Phi y The mean becomes exactly the Least Squares (Maximum Likelihood) estimate.
  2. Case 2: α0\alpha \to 0 (Zero Prior Variance)

    • Meaning: The prior p(θ)p(\theta) becomes a Dirac delta function at 0. We are infinitely certain that θ=0\theta = 0 before seeing data. 1α\frac{1}{\alpha} \to \infty.
    • Covariance: The term 1αI\frac{1}{\alpha} I dominates the inverse. The matrix becomes "infinitely large", so its inverse goes to 0. Σ^θ0\hat{\Sigma}_\theta \to 0
    • Mean: The prior precision dominates the data precision. μ^θ0\hat{\mu}_\theta \to 0 The data is ignored, and the posterior remains at the prior mean (0).
  3. Case 3: σ20\sigma^2 \to 0 (Zero Observation Noise)

    • Meaning: We trust the data perfectly. The precision 1σ2\frac{1}{\sigma^2} \to \infty.
    • Covariance: The data term dominates. Σ^θ=σ2()10\hat{\Sigma}_\theta = \sigma^2 (\dots)^{-1} \to 0 Our uncertainty about θ\theta vanishes (assuming ΦΦT\Phi \Phi^T is full rank, i.e., we have enough data to determine θ\theta uniquely).
    • Mean: The data term (1σ2\frac{1}{\sigma^2}) overwhelms the prior term (1α\frac{1}{\alpha}). μ^θ(ΦΦT)1Φy\hat{\mu}_\theta \to (\Phi \Phi^T)^{-1} \Phi y (Assuming ΦΦT\Phi \Phi^T is invertible). The estimate converges to the solution that interpolates the data points. If ΦΦT\Phi \Phi^T is not invertible (more parameters than data), the limit is the minimum norm solution that fits the data exactly.