-
Bayes' Theorem:
The posterior distribution of the parameters θ given the data D can be found using Bayes' rule:
p(θ∣D)=p(y∣X)p(y∣θ,X)p(θ)∝p(y∣θ,X)p(θ)
where y contains the targets and X contains all features. To find the posterior, we can work with the unnormalized log-posterior:
lnp(θ∣D)=lnp(y∣θ,X)+lnp(θ)+const
-
Likelihood and Prior:
From the model equation y=ΦTθ+ϵ and ϵ∼N(0,Σ), the likelihood is y∣θ,X∼N(ΦTθ,Σ). Hence:
lnp(y∣θ,X)=−21(y−ΦTθ)TΣ−1(y−ΦTθ)+const
The prior is p(θ)=N(0,Γ), so:
lnp(θ)=−21θTΓ−1θ+const
-
Log-Posterior:
Adding the log-likelihood and log-prior:
lnp(θ∣D)=−21((y−ΦTθ)TΣ−1(y−ΦTθ)+θTΓ−1θ)+const=−21(yTΣ−1y−yTΣ−1ΦTθ−θTΦΣ−1y+θTΦΣ−1ΦTθ+θTΓ−1θ)+const
Noting that yTΣ−1y is a constant with respect to θ, and yTΣ−1ΦTθ=(θTΦΣ−1y)T is a scalar (so they are identical), we collect the terms quadratic and linear in θ:
lnp(θ∣D)=−21[θT(Γ−1+ΦΣ−1ΦT)θ−2θTΦΣ−1y]+const
-
Completing the Square:
We want to express this in the form of a general normal distribution log-pdf:
lnN(θ∣μ^θ,Σ^θ)=−21(θ−μ^θ)TΣ^θ−1(θ−μ^θ)+const′
Expanding this form yields:
−21(θTΣ^θ−1θ−2θTΣ^θ−1μ^θ+μ^θTΣ^θ−1μ^θ)+const′
Comparing the quadratic term in θ:
Σ^θ−1=Γ−1+ΦΣ−1ΦT⟹Σ^θ=(Γ−1+ΦΣ−1ΦT)−1
Comparing the linear term in θ:
Σ^θ−1μ^θ=ΦΣ−1y⟹μ^θ=Σ^θΦΣ−1y=(Γ−1+ΦΣ−1ΦT)−1ΦΣ−1y
-
Conclusion:
Thus, the posterior is a Gaussian distribution p(θ∣D)=N(θ∣μ^θ,Σ^θ) with the required mean and covariance.