Explain
Intuition
In part (a), we looked at Least-Squares Regression purely from a geometric and algebraic perspective: find the line that physically minimizes the squared distances to the points.
In this part, we use Maximum Likelihood Estimation (MLE), which takes a probabilistic perspective. Instead of asking "what line minimizes the distance?", MLE asks "assuming the data was generated by our model plus some random noise, what parameters make the data we actually observed the most probable?"
The Connection: Gaussian Noise is the Bridge
Why do these two completely different philosophies lead to the exact same math? The secret lies in our assumption about the noise .
Because the measurement noise follows a Gaussian (Normal) distribution, the probability of a specific error drops off exponentially according to the square of that error:
When we want to maximize the probability (likelihood) of all points simultaneously, we multiply their probabilities. When you multiply numbers with exponents, you add the exponents together:
To make the overall probability as large as possible (MLE), we need the exponent to be as close to zero as possible. This means we must minimize the sum of the squared errors, which brings us right back to the objective of Least-Squares!
Key Takeaway
Least Squares is precisely the Maximum Likelihood Estimate when the noise affecting your data is assumed to be Gaussian. If your noise followed a different distribution (like a Laplace distribution), MLE would lead to a different objective function (like minimizing the absolute error instead of squared error).