Skip to main content

Answer

Pre-required Knowledge

  1. Matrix Inversion Properties: (kA)1=1kA1(kA)^{-1} = \frac{1}{k} A^{-1}.
  2. Vector Calculus: θ(yΦTθ2)=2Φ(yΦTθ)\nabla_\theta (\|y - \Phi^T \theta\|^2) = -2 \Phi (y - \Phi^T \theta). θ(θTθ)=2θ\nabla_\theta (\theta^T \theta) = 2 \theta.

Step-by-Step Answer

Part 1: Deriving MAP under i.i.d. assumptions

  1. Substitute Γ\Gamma and Σ\Sigma: Start with the general MAP formula from (a)/(b):

    θ^MAP=(Γ1+ΦΣ1ΦT)1ΦΣ1y\hat{\theta}_{MAP} = (\Gamma^{-1} + \Phi \Sigma^{-1} \Phi^T)^{-1} \Phi \Sigma^{-1} y

    Substitute Γ=αI\Gamma = \alpha I and Σ=σ2I\Sigma = \sigma^2 I:

    θ^MAP=((αI)1+Φ(σ2I)1ΦT)1Φ(σ2I)1y\hat{\theta}_{MAP} = ((\alpha I)^{-1} + \Phi (\sigma^2 I)^{-1} \Phi^T)^{-1} \Phi (\sigma^2 I)^{-1} y

    Scalars come out of the inverse (α1=1/α\alpha^{-1} = 1/\alpha):

    θ^MAP=(1αI+1σ2ΦΦT)11σ2Φy\hat{\theta}_{MAP} = (\frac{1}{\alpha} I + \frac{1}{\sigma^2} \Phi \Phi^T)^{-1} \frac{1}{\sigma^2} \Phi y
  2. Simplify: Factor out 1σ2\frac{1}{\sigma^2} from the inverse term. Recall (kA)1=1kA1(kA)^{-1} = \frac{1}{k} A^{-1}, so A1=1k(kA)1A^{-1} = \frac{1}{k} (k A)^{-1}? No, let's just multiply the equation by identity. Let A=1αI+1σ2ΦΦTA = \frac{1}{\alpha} I + \frac{1}{\sigma^2} \Phi \Phi^T. We want A1A^{-1}. A=1σ2(σ2αI+ΦΦT)A = \frac{1}{\sigma^2} (\frac{\sigma^2}{\alpha} I + \Phi \Phi^T). A1=σ2(σ2αI+ΦΦT)1A^{-1} = \sigma^2 (\frac{\sigma^2}{\alpha} I + \Phi \Phi^T)^{-1}.

    Substitute back:

    θ^MAP=[σ2(σ2αI+ΦΦT)1]1σ2Φy\hat{\theta}_{MAP} = \left[ \sigma^2 (\frac{\sigma^2}{\alpha} I + \Phi \Phi^T)^{-1} \right] \frac{1}{\sigma^2} \Phi y

    The σ2\sigma^2 and 1σ2\frac{1}{\sigma^2} cancel out:

    θ^MAP=(ΦΦT+σ2αI)1Φy\hat{\theta}_{MAP} = (\Phi \Phi^T + \frac{\sigma^2}{\alpha} I)^{-1} \Phi y
  3. Identify λ\lambda: Setting λ=σ2α\lambda = \frac{\sigma^2}{\alpha}, we get:

    θ^MAP=(ΦΦT+λI)1Φy\hat{\theta}_{MAP} = (\Phi \Phi^T + \lambda I)^{-1} \Phi y

    Since variances σ2\sigma^2 and α\alpha are positive, λ0\lambda \ge 0.

Part 2: Solving Regularized Least Squares

  1. Define Objective Function:

    J(θ)=yΦTθ2+λθ2J(\theta) = \|y - \Phi^T \theta\|^2 + \lambda \|\theta\|^2 J(θ)=(yΦTθ)T(yΦTθ)+λθTθJ(\theta) = (y - \Phi^T \theta)^T (y - \Phi^T \theta) + \lambda \theta^T \theta
  2. Calculate Gradient:

    θJ(θ)=θ(yTy2yTΦTθ+θTΦΦTθ+λθTθ)\nabla_\theta J(\theta) = \nabla_\theta (y^T y - 2y^T \Phi^T \theta + \theta^T \Phi \Phi^T \theta + \lambda \theta^T \theta) =2Φy+2ΦΦTθ+2λθ= -2 \Phi y + 2 \Phi \Phi^T \theta + 2 \lambda \theta
  3. Set Gradient to Zero:

    2Φy+2(ΦΦT+λI)θ=0-2 \Phi y + 2 (\Phi \Phi^T + \lambda I) \theta = 0 (ΦΦT+λI)θ=Φy(\Phi \Phi^T + \lambda I) \theta = \Phi y
  4. Solve for θ\theta:

    θ^=(ΦΦT+λI)1Φy\hat{\theta} = (\Phi \Phi^T + \lambda I)^{-1} \Phi y

    This matches the specific MAP estimate derived above.