Problem 3.11 Bayesian regression with Gaussian prior
In the last problem set, we showed that various forms of linear regression by the method of least squares are really just particular cases of ML estimation under the model
y=ΦTθ+ϵ(3.42)
where θ=[θ1,…,θD]T is the parameter vector, y=[y1,…,yn]T is the vector of outputs, {x1,…,xn} are the set of corresponding inputs, ϕ(xi) is a feature transformation, with
Φ=[ϕ(x1),…,ϕ(xn)](3.43)
and ϵ=[ϵ1,…,ϵn]T is a normal random process ϵ∼N(0,Σ), with some covariance matrix Σ.
It seems only natural to consider the Bayesian extension of this model. For this, we simply extend the model considering a Gaussian prior
p(θ)=N(θ∣0,Γ),
where Γ is the covariance matrix. We will first derive a general result (for generic covariance matrices Σ and Γ), and then show how it relates to other methods.
(a) Given a training set D={(x1,y1),…,(xn,yn)}, show that the posterior distribution is
p(θ∣D)=N(θ∣μ^θ,Σ^θ),(3.44)
μ^θ=(Γ−1+ΦΣ−1ΦT)−1ΦΣ−1y,(3.45)
Σ^θ=(Γ−1+ΦΣ−1ΦT)−1,(3.46)
where μ^θ is the posterior mean and Σ^θ is the posterior covariance. Do not assume any specific form of the covariance matrices Σ and Γ. Hint: complete the square (Problem 1.10).