In this problem we will consider the issue of linear regression and the connections between maximum likelihood and least squares solutions. Consider the polynomial function of x∈R,
f(x,θ)=k=0∑Kxkθk=ϕ(x)Tθ(2.7)
where we define the feature transformation ϕ(x) and the parameter vector θ (both of dimension D=K+1) as
ϕ(x)=[1,x,x2,⋯,xK]T∈RD,θ=[θ0,⋯,θK]T∈RD.(2.8)
Given an input x, instead of observing the actual function value f(x,θ), we observe a noisy version y,
y=f(x,θ)+ϵ,(2.9)
where ϵ is an Gaussian random variable of zero mean and variance σ2. Our goal is to obtain the best estimate of the function given iid samples D={(x1,y1),…,(xn,yn)}.
(a) Formulate the problem as one of least squares, i.e define