Explain
1. Matrix Dimensions Formulation
In standard textbooks, the design matrix is often defined as , where each row is a sample. However, in this problem, is defined as , where each column is a sample .
- has dimension .
- has dimension .
- has dimension .
So the linear model prediction for all samples is (), which matches the dimension of . This is why the term is .
2. Geometric Interpretation of Projection
The equation can be rewritten as:
The error vector is . At the optimal solution, the error vector is orthogonal to the column space of the design matrix (or row space of in this notation). Mathematically, , which leads directly to the normal equation.
3. "Least Squares" Intuition
We want to find a line (or polynomial curve) that passes closest to all points. "Closest" is defined by the vertical distance (residual) between the point and the line. We square these distances so that positive and negative errors don't cancel each other out, and to penalize large errors more heavily. Minimizing this sum of squared errors gives us the "Least Squares" solution.