Answer: Least Squares Regression
Pre-required Knowledge
- Matrix Calculus: Specifically, derivatives of vector norms and quadratic forms.
- ∂x∂(Ax−b)T(Ax−b)=2AT(Ax−b) for symmetric inner products or standard derivations.
- ∂x∂xTAx=(A+AT)x.
- Linear Algebra: Concept of rank, inverse, and transpose.
We are given:
- y=[y1,…,yn]T∈Rn
- Φ=[ϕ(x1),…,ϕ(xn)]∈RD×n
- θ∈RD
We can write the vector of predicted values as:
y^=ϕ(x1)Tθ⋮ϕ(xn)Tθ=ϕ(x1)T⋮ϕ(xn)Tθ=ΦTθ
Note that ΦT is an n×D matrix.
The sum-squared-error is:
E(θ)=i=1∑n(yi−ϕ(xi)Tθ)2=∥y−ΦTθ∥2
2. Derivation
We want to minimize E(θ) with respect to θ. Let's expand the squared norm:
E(θ)=(y−ΦTθ)T(y−ΦTθ)=(yT−θTΦ)(y−ΦTθ)=yTy−yTΦTθ−θTΦy+θTΦΦTθ
Since yTΦTθ is a scalar, it equals its transpose θTΦy. Thus:
E(θ)=yTy−2θTΦy+θTΦΦTθ
Now, we take the gradient with respect to θ and set it to zero:
∇θE(θ)=−2Φy+2ΦΦTθ=0
(Using the identity ∇x(xTAx)=2Ax when A is symmetric, here A=ΦΦT)
3. Solution (Normal Equation)
Rearranging the equation:
ΦΦTθ=Φy
Assuming ΦΦT (a D×D matrix) is invertible (which effectively means the features are linearly independent and we have enough data), we get:
θ^LS=(ΦΦT)−1Φy