Skip to main content

Answer

Prerequisites

  • Matrix Calculus: Rules for differentiating matrix expressions with respect to a vector.
  • Least-Squares Regression: Minimization of the sum-squared-error.

Step-by-Step Derivation

  1. Define the Objective Function The problem asks us to find the parameter vector θ\theta that minimizes the sum-squared-error, denoted as J(θ)J(\theta):

    J(θ)=yΦTθ2J(\theta) = \| y - \Phi^T \theta \|^2
  2. Expand the Objective Function We can express the squared L2L_2 norm as an inner product:

    J(θ)=(yΦTθ)T(yΦTθ)=(yTθTΦ)(yΦTθ)=yTyyTΦTθθTΦy+θTΦΦTθ\begin{aligned} J(\theta) &= (y - \Phi^T \theta)^T (y - \Phi^T \theta) \\ &= (y^T - \theta^T \Phi)(y - \Phi^T \theta) \\ &= y^T y - y^T \Phi^T \theta - \theta^T \Phi y + \theta^T \Phi \Phi^T \theta \end{aligned}
  3. Simplify the Expression Note that yTΦTθy^T \Phi^T \theta is a scalar quantity (dimension 1×nn×DD×1=1×11 \times n \cdot n \times D \cdot D \times 1 = 1 \times 1). The transpose of a scalar is itself, so:

    (yTΦTθ)T=θTΦy(y^T \Phi^T \theta)^T = \theta^T \Phi y

    Therefore, the two middle terms are equal, and the objective function simplifies to:

    J(θ)=yTy2θTΦy+θTΦΦTθJ(\theta) = y^T y - 2 \theta^T \Phi y + \theta^T \Phi \Phi^T \theta
  4. Compute the Derivative with Respect to θ\theta To find the minimum, we take the gradient of J(θ)J(\theta) with respect to θ\theta and set it to the zero vector. Using standard matrix calculus identities:

    • θ(θTA)=A\nabla_\theta (\theta^T A) = A
    • θ(θTAθ)=(A+AT)θ\nabla_\theta (\theta^T A \theta) = (A + A^T)\theta

    For symmetric matrix A=ΦΦTA = \Phi \Phi^T, θ(θT(ΦΦT)θ)=2ΦΦTθ\nabla_\theta (\theta^T (\Phi \Phi^T) \theta) = 2 \Phi \Phi^T \theta. Thus:

    J(θ)θ=2Φy+2ΦΦTθ\frac{\partial J(\theta)}{\partial \theta} = - 2 \Phi y + 2 \Phi \Phi^T \theta
  5. Solve for θ\theta Set the derivative to zero to find the optimal θ\theta:

    2Φy+2ΦΦTθ=0ΦΦTθ=Φy\begin{aligned} - 2 \Phi y + 2 \Phi \Phi^T \theta &= 0 \\ \Phi \Phi^T \theta &= \Phi y \end{aligned}

    Assuming ΦΦT\Phi \Phi^T is invertible, we multiply both sides by (ΦΦT)1(\Phi \Phi^T)^{-1}:

    θ^LS=(ΦΦT)1Φy\hat{\theta}_{LS} = (\Phi \Phi^T)^{-1} \Phi y