Skip to main content

Question

(e) Given a novel input xx_*, show that the predictive distribution of f=f(x,θ)f_* = f(x_*, \theta) is

p(fx,D)=N(fμ^,σ^2),(3.50)p(f_* | x_*, \mathcal{D}) = \mathcal{N}(f_* | \hat{\mu}_*, \hat{\sigma}_*^2), \quad (3.50) μ^=ϕ(x)Tμ^θ,(3.51)\hat{\mu}_* = \phi(x_*)^T \hat{\mu}_\theta, \quad (3.51) σ^2=ϕ(x)TΣ^θϕ(x).(3.52)\hat{\sigma}_*^2 = \phi(x_*)^T \hat{\Sigma}_\theta \phi(x_*). \quad (3.52)

(Hint: see Problem 1.1). Assuming the same observation noise σ2\sigma^2 as the training set, show that the predictive distribution of yy_* is

p(yx,D)=p(yx,θ)p(θD)dθ=N(yμ^,σ2+σ^2).(3.53)p(y_* | x_*, \mathcal{D}) = \int p(y_* | x_*, \theta) p(\theta | \mathcal{D}) d\theta = \mathcal{N}(y_* | \hat{\mu}_*, \sigma^2 + \hat{\sigma}_*^2). \quad (3.53)

Hint: note that p(yx,θ)p(y_*|x_*, \theta) only depends on θ\theta through f=ϕ(x)Tθf_* = \phi(x_*)^T \theta. Hence, we can rewrite the integral over θ\theta with an integral over ff_*, while replacing p(θD)p(\theta|\mathcal{D}) with p(fD)p(f_*|\mathcal{D}).

This is the linear version of Gaussian process regression. We will see how to derive the nonlinear (kernel) version in a later problem set.