Skip to main content

Question

Problem 3.11 Bayesian regression with Gaussian prior

In the last problem set, we showed that various forms of linear regression by the method of least squares are really just particular cases of ML estimation under the model

y=ΦTθ+ϵ(3.42)y = \Phi^T \theta + \epsilon \quad (3.42)

where θ=[θ1,,θD]T\theta = [\theta_1, \dots, \theta_D]^T is the parameter vector, y=[y1,,yn]Ty = [y_1, \dots, y_n]^T is the vector of outputs, {x1,,xn}\{x_1, \dots, x_n\} are the set of corresponding inputs, ϕ(xi)\phi(x_i) is a feature transformation, with

Φ=[ϕ(x1),,ϕ(xn)](3.43)\Phi = [\phi(x_1), \dots, \phi(x_n)] \quad (3.43)

and ϵ=[ϵ1,,ϵn]T\epsilon = [\epsilon_1, \dots, \epsilon_n]^T is a normal random process ϵN(0,Σ)\epsilon \sim \mathcal{N}(0, \Sigma), with some covariance matrix Σ\Sigma. It seems only natural to consider the Bayesian extension of this model. For this, we simply extend the model considering a Gaussian prior

p(θ)=N(θ0,Γ),p(\theta) = \mathcal{N}(\theta|0, \Gamma),

where Γ\Gamma is the covariance matrix. We will first derive a general result (for generic covariance matrices Σ\Sigma and Γ\Gamma), and then show how it relates to other methods.

(a) Given a training set D={(x1,y1),,(xn,yn)}\mathcal{D} = \{(x_1, y_1), \dots, (x_n, y_n)\}, show that the posterior distribution is

p(θD)=N(θμ^θ,Σ^θ),(3.44)p(\theta|\mathcal{D}) = \mathcal{N}(\theta|\hat{\mu}_\theta, \hat{\Sigma}_\theta), \quad (3.44) μ^θ=(Γ1+ΦΣ1ΦT)1ΦΣ1y,(3.45)\hat{\mu}_\theta = (\Gamma^{-1} + \Phi \Sigma^{-1} \Phi^T)^{-1} \Phi \Sigma^{-1} y, \quad (3.45) Σ^θ=(Γ1+ΦΣ1ΦT)1,(3.46)\hat{\Sigma}_\theta = (\Gamma^{-1} + \Phi \Sigma^{-1} \Phi^T)^{-1}, \quad (3.46)

where μ^θ\hat{\mu}_\theta is the posterior mean and Σ^θ\hat{\Sigma}_\theta is the posterior covariance. Do not assume any specific form of the covariance matrices Σ\Sigma and Γ\Gamma. Hint: complete the square (Problem 1.10).