Skip to main content

Question

In this problem we will consider the issue of linear regression and the connections between maximum likelihood and least squares solutions. Consider the polynomial function of xRx \in \mathbb{R},

f(x,θ)=k=0Kxkθk=ϕ(x)Tθ(2.7)f(x, \theta) = \sum_{k=0}^K x^k \theta_k = \phi(x)^T \theta \quad (2.7)

where we define the feature transformation ϕ(x)\phi(x) and the parameter vector θ\theta (both of dimension D=K+1D=K+1) as

ϕ(x)=[1,x,x2,,xK]TRD,θ=[θ0,,θK]TRD.(2.8)\phi(x) = [1, x, x^2, \cdots, x^K]^T \in \mathbb{R}^D, \quad \theta = [\theta_0, \cdots, \theta_K]^T \in \mathbb{R}^D. \quad (2.8)

Given an input xx, instead of observing the actual function value f(x,θ)f(x, \theta), we observe a noisy version yy,

y=f(x,θ)+ϵ,(2.9)y = f(x, \theta) + \epsilon, \quad (2.9)

where ϵ\epsilon is an Gaussian random variable of zero mean and variance σ2\sigma^2. Our goal is to obtain the best estimate of the function given iid samples D={(x1,y1),,(xn,yn)}\mathcal{D} = \{(x_1, y_1), \dots, (x_n, y_n)\}.

(a) Formulate the problem as one of least squares, i.e define

y=[y1yn],Φ=[ϕ(x1),,ϕ(xn)]=[11x11xn1x1KxnK](2.10)y = \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}, \quad \Phi = [\phi(x_1), \cdots, \phi(x_n)] = \begin{bmatrix} 1 & \cdots & 1 \\ x_1^1 & & x_n^1 \\ \vdots & & \vdots \\ x_1^K & \cdots & x_n^K \end{bmatrix} \quad (2.10)

and find the value of θ\theta that minimizes the sum-squared-error,

i=1n(yiϕ(xi)Tθ)2=yΦTθ2.(2.11)\sum_{i=1}^n (y_i - \phi(x_i)^T \theta)^2 = \|y - \Phi^T \theta\|^2. \quad (2.11)