Skip to main content

Question

Problem 2.8 Least-squares regression and MLE

In this problem we will consider the issue of linear regression and the connections between maximum likelihood and least squares solutions. Consider the polynomial function of xRx \in \mathbb{R},

f(x,θ)=k=0Kxkθk=ϕ(x)Tθ(2.7)f(x, \theta) = \sum_{k=0}^K x^k \theta_k = \phi(x)^T \theta \quad \quad (2.7)

where we define the feature transformation ϕ(x)\phi(x) and the parameter vector θ\theta (both of dimension D=K+1D = K + 1) as

ϕ(x)=[1,x,x2,,xK]TRD,θ=[θ0,,θK]TRD.(2.8)\phi(x) = \left[ 1, x, x^2, \cdots, x^K \right]^T \in \mathbb{R}^D, \quad \theta = \left[ \theta_0, \cdots, \theta_K \right]^T \in \mathbb{R}^D. \quad \quad (2.8)

Given an input xx, instead of observing the actual function value f(x,θ)f(x, \theta), we observe a noisy version yy,

y=f(x,θ)+ϵ(2.9)y = f(x, \theta) + \epsilon \quad \quad (2.9)

where ϵ\epsilon is an Gaussian random variable of zero mean and variance σ2\sigma^2. Our goal is to obtain the best estimate of the function given iid samples D={(x1,y1),,(xn,yn)}\mathcal{D} = \{(x_1, y_1), \dots, (x_n, y_n)\}.

(b) Formulate the problem as one of ML estimation, i.e. write down the likelihood function p(yx,θ)p(y | x, \theta), and compute the ML estimate, i.e. the value of θ\theta that maximizes p(y1,,ynx1,,xn,θ)p(y_1, \cdots, y_n | x_1, \cdots, x_n, \theta). Show that this is equivalent to (a).

Hint: the vector derivatives listed in Problem 2.6 might be helpful.