Define the Probability Model
We are given that yi=ϕ(xi)Tθ+ϵi, where ϵi∼N(0,σ2).
Because ϕ(xi)Tθ is a deterministic value for a given xi and θ, the distribution of yi is a Gaussian centered at ϕ(xi)Tθ:
p(yi∣xi,θ)=2πσ21exp(−2σ2(yi−ϕ(xi)Tθ)2)
Write the Likelihood Function
Since the samples D={(xi,yi)}i=1n are independently and identically distributed (i.i.d.), the joint likelihood of all n observations is the product of their individual probabilities:
Compute the Log-Likelihood Function
To find the maximum, it is mathematically much simpler to maximize the natural logarithm of the likelihood, lnL(θ), often denoted as ℓ(θ). The logarithm is a monotonically increasing function, so maximizing ℓ(θ) is equivalent to maximizing L(θ).
Show Equivalence to Least Squares
We want to find θ that maximizes ℓ(θ).
Notice that the first term −2nln(2πσ2) is a constant with respect to θ, and the factor 2σ21 is a positive constant.
Therefore, maximizing the negative term is exactly equivalent to minimizing the positive summation:
argθmaxℓ(θ)=argθmini=1∑n(yi−ϕ(xi)Tθ)2
This summation is exactly the sum-squared-error objective function J(θ) from part (a):
i=1∑n(yi−ϕ(xi)Tθ)2=∥y−ΦTθ∥2
Conclusion
Since the optimization problem is identical, the Maximum Likelihood (ML) estimate θ^ML must be equivalent to the Least Squares estimate θ^LS: