-
Identify the Likelihood function p(D∣θ):
From the model y=ΦTθ+ϵ, where ϵ∼N(0,Σ), we have that y given θ follows a Gaussian distribution:
p(y∣θ)=N(y∣ΦTθ,Σ)∝exp(−21(y−ΦTθ)TΣ−1(y−ΦTθ))
Here D=(X,y), but since X is fixed (discriminative setting), we look at p(y∣θ).
-
Identify the Prior distribution p(θ):
p(θ)=N(θ∣0,Γ)∝exp(−21θTΓ−1θ)
-
Formulate the Posterior p(θ∣D):
Using Bayes' rule, the posterior is proportional to the product of the likelihood and the prior:
p(θ∣D)∝p(y∣θ)p(θ)
Substitute the exponentials:
∝exp(−21(y−ΦTθ)TΣ−1(y−ΦTθ))exp(−21θTΓ−1θ)
Combine the exponents into a single expression E:
E=−21[(y−ΦTθ)TΣ−1(y−ΦTθ)+θTΓ−1θ]
-
Expand and Group Terms w.r.t θ:
Expand the first term (note that (y−ΦTθ)T=yT−θTΦ):
(y−ΦTθ)TΣ−1(y−ΦTθ)=yTΣ−1y−yTΣ−1ΦTθ−θTΦΣ−1y+θTΦΣ−1ΦTθ
Since the result is a scalar, yTΣ−1ΦTθ=(θTΦΣ−1y)T=θTΦΣ−1y (assuming Σ is symmetric).
So, the cross terms are −2θTΦΣ−1y.
Now substitute back into E and group by powers of θ:
−2E=θTΦΣ−1ΦTθ−2θTΦΣ−1y+yTΣ−1y+θTΓ−1θ
Group quadratic terms (θTAθ) and linear terms (−2θTb):
−2E=θT(ΦΣ−1ΦT+Γ−1)θ−2θT(ΦΣ−1y)+const
where "const" involves terms independent of θ (like yTΣ−1y).
-
Complete the Square:
We compare this to the exponent of a Gaussian N(θ∣μ^,Σ^):
−21(θ−μ^)TΣ^−1(θ−μ^)=−21[θTΣ^−1θ−2θTΣ^−1μ^+μ^TΣ^−1μ^]
Comparing the quadratic term θT(…)θ:
Σ^θ−1=Γ−1+ΦΣ−1ΦT
So,
Σ^θ=(Γ−1+ΦΣ−1ΦT)−1
Comparing the linear term −2θT(…):
Σ^θ−1μ^θ=ΦΣ−1y
μ^θ=Σ^θ(ΦΣ−1y)=(Γ−1+ΦΣ−1ΦT)−1ΦΣ−1y
-
Conclusion:
The posterior is indeed Gaussian with the derived mean and covariance:
p(θ∣D)=N(θ∣μ^θ,Σ^θ)