We observe yi=ϕ(xi)Tθ+ϵi where ϵi∼N(0,σ2).
This implies that given xi and θ, yi follows a Gaussian distribution with mean μi=ϕ(xi)Tθ and variance σ2:
p(yi∣xi,θ)=2πσ21exp(−2σ2(yi−ϕ(xi)Tθ)2)
Since the samples are independent and identically distributed (i.i.d), the likelihood of the entire dataset is the product of individual probabilities:
To find the ML estimate θ^ML, we maximize ℓ(θ) with respect to θ.
Notice that the first term −2nln(2πσ2) is constant with respect to θ and can be ignored.
Maximizing the remaining term is equivalent to maximizing:
−2σ21i=1∑n(yi−ϕ(xi)Tθ)2
Since 2σ21>0, maximizing this negative quantity is equivalent to minimizing the positive quantity inside the sum:
θ^ML=argθmini=1∑n(yi−ϕ(xi)Tθ)2
This objective function is exactly the sum-squared-error from Part (a).
Therefore, minimizing the sum of squared errors is equivalent to maximizing the likelihood under the assumption of Gaussian noise. The solution is the same: