(e) Given a novel input x∗, show that the predictive distribution of f∗=f(x∗,θ) is
p(f∗∣x∗,D)=N(f∗∣μ^∗,σ^∗2),(3.50)
μ^∗=ϕ(x∗)Tμ^θ,(3.51)
σ^∗2=ϕ(x∗)TΣ^θϕ(x∗).(3.52)
(Hint: see Problem 1.1). Assuming the same observation noise σ2 as the training set, show that the predictive distribution of y∗ is
p(y∗∣x∗,D)=∫p(y∗∣x∗,θ)p(θ∣D)dθ=N(y∗∣μ^∗,σ2+σ^∗2).(3.53)
Hint: note that p(y∗∣x∗,θ) only depends on θ through f∗=ϕ(x∗)Tθ. Hence, we can rewrite the integral over θ with an integral over f∗, while replacing p(θ∣D) with p(f∗∣D).
This is the linear version of Gaussian process regression. We will see how to derive the nonlinear (kernel) version in a later problem set.