Skip to main content

Answer

Pre-required Knowledge

  1. Bayes' Theorem for Continuous Variables: p(θD)=p(Dθ)p(θ)p(D)p(Dθ)p(θ)p(\theta | \mathcal{D}) = \frac{p(\mathcal{D} | \theta) p(\theta)}{p(\mathcal{D})} \propto p(\mathcal{D} | \theta) p(\theta)
  2. Multivariate Gaussian Distribution: N(xμ,Σ)exp(12(xμ)TΣ1(xμ))\mathcal{N}(x | \mu, \Sigma) \propto \exp \left( -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) \right)
  3. Completing the Square in Matrix Form: If a quadratic form in θ\theta looks like 12(θTAθ2θTb)-\frac{1}{2} (\theta^T A \theta - 2 \theta^T b), it corresponds to a Gaussian with covariance A1A^{-1} and mean A1bA^{-1}b (ignoring constants).

Step-by-Step Answer

  1. Identify the Likelihood function p(Dθ)p(\mathcal{D}|\theta): From the model y=ΦTθ+ϵy = \Phi^T \theta + \epsilon, where ϵN(0,Σ)\epsilon \sim \mathcal{N}(0, \Sigma), we have that yy given θ\theta follows a Gaussian distribution:

    p(yθ)=N(yΦTθ,Σ)exp(12(yΦTθ)TΣ1(yΦTθ))p(y | \theta) = \mathcal{N}(y | \Phi^T \theta, \Sigma) \propto \exp \left( -\frac{1}{2} (y - \Phi^T \theta)^T \Sigma^{-1} (y - \Phi^T \theta) \right)

    Here D=(X,y)\mathcal{D} = (X, y), but since XX is fixed (discriminative setting), we look at p(yθ)p(y|\theta).

  2. Identify the Prior distribution p(θ)p(\theta):

    p(θ)=N(θ0,Γ)exp(12θTΓ1θ)p(\theta) = \mathcal{N}(\theta | 0, \Gamma) \propto \exp \left( -\frac{1}{2} \theta^T \Gamma^{-1} \theta \right)
  3. Formulate the Posterior p(θD)p(\theta|\mathcal{D}): Using Bayes' rule, the posterior is proportional to the product of the likelihood and the prior:

    p(θD)p(yθ)p(θ)p(\theta | \mathcal{D}) \propto p(y | \theta) p(\theta)

    Substitute the exponentials:

    exp(12(yΦTθ)TΣ1(yΦTθ))exp(12θTΓ1θ)\propto \exp \left( -\frac{1}{2} (y - \Phi^T \theta)^T \Sigma^{-1} (y - \Phi^T \theta) \right) \exp \left( -\frac{1}{2} \theta^T \Gamma^{-1} \theta \right)

    Combine the exponents into a single expression EE:

    E=12[(yΦTθ)TΣ1(yΦTθ)+θTΓ1θ]E = -\frac{1}{2} \left[ (y - \Phi^T \theta)^T \Sigma^{-1} (y - \Phi^T \theta) + \theta^T \Gamma^{-1} \theta \right]
  4. Expand and Group Terms w.r.t θ\theta: Expand the first term (note that (yΦTθ)T=yTθTΦ(y - \Phi^T \theta)^T = y^T - \theta^T \Phi):

    (yΦTθ)TΣ1(yΦTθ)=yTΣ1yyTΣ1ΦTθθTΦΣ1y+θTΦΣ1ΦTθ(y - \Phi^T \theta)^T \Sigma^{-1} (y - \Phi^T \theta) = y^T \Sigma^{-1} y - y^T \Sigma^{-1} \Phi^T \theta - \theta^T \Phi \Sigma^{-1} y + \theta^T \Phi \Sigma^{-1} \Phi^T \theta

    Since the result is a scalar, yTΣ1ΦTθ=(θTΦΣ1y)T=θTΦΣ1yy^T \Sigma^{-1} \Phi^T \theta = (\theta^T \Phi \Sigma^{-1} y)^T = \theta^T \Phi \Sigma^{-1} y (assuming Σ\Sigma is symmetric). So, the cross terms are 2θTΦΣ1y-2 \theta^T \Phi \Sigma^{-1} y.

    Now substitute back into EE and group by powers of θ\theta:

    2E=θTΦΣ1ΦTθ2θTΦΣ1y+yTΣ1y+θTΓ1θ-2E = \theta^T \Phi \Sigma^{-1} \Phi^T \theta - 2 \theta^T \Phi \Sigma^{-1} y + y^T \Sigma^{-1} y + \theta^T \Gamma^{-1} \theta

    Group quadratic terms (θTAθ\theta^T A \theta) and linear terms (2θTb-2 \theta^T b):

    2E=θT(ΦΣ1ΦT+Γ1)θ2θT(ΦΣ1y)+const-2E = \theta^T (\Phi \Sigma^{-1} \Phi^T + \Gamma^{-1}) \theta - 2 \theta^T (\Phi \Sigma^{-1} y) + \text{const}

    where "const" involves terms independent of θ\theta (like yTΣ1yy^T \Sigma^{-1} y).

  5. Complete the Square: We compare this to the exponent of a Gaussian N(θμ^,Σ^)\mathcal{N}(\theta | \hat{\mu}, \hat{\Sigma}):

    12(θμ^)TΣ^1(θμ^)=12[θTΣ^1θ2θTΣ^1μ^+μ^TΣ^1μ^]-\frac{1}{2} (\theta - \hat{\mu})^T \hat{\Sigma}^{-1} (\theta - \hat{\mu}) = -\frac{1}{2} [\theta^T \hat{\Sigma}^{-1} \theta - 2 \theta^T \hat{\Sigma}^{-1} \hat{\mu} + \hat{\mu}^T \hat{\Sigma}^{-1} \hat{\mu}]

    Comparing the quadratic term θT()θ\theta^T (\dots) \theta:

    Σ^θ1=Γ1+ΦΣ1ΦT\hat{\Sigma}_\theta^{-1} = \Gamma^{-1} + \Phi \Sigma^{-1} \Phi^T

    So,

    Σ^θ=(Γ1+ΦΣ1ΦT)1\hat{\Sigma}_\theta = (\Gamma^{-1} + \Phi \Sigma^{-1} \Phi^T)^{-1}

    Comparing the linear term 2θT()-2 \theta^T (\dots):

    Σ^θ1μ^θ=ΦΣ1y\hat{\Sigma}_\theta^{-1} \hat{\mu}_\theta = \Phi \Sigma^{-1} y μ^θ=Σ^θ(ΦΣ1y)=(Γ1+ΦΣ1ΦT)1ΦΣ1y\hat{\mu}_\theta = \hat{\Sigma}_\theta (\Phi \Sigma^{-1} y) = (\Gamma^{-1} + \Phi \Sigma^{-1} \Phi^T)^{-1} \Phi \Sigma^{-1} y
  6. Conclusion: The posterior is indeed Gaussian with the derived mean and covariance:

    p(θD)=N(θμ^θ,Σ^θ)p(\theta|\mathcal{D}) = \mathcal{N}(\theta|\hat{\mu}_\theta, \hat{\Sigma}_\theta)