-
Substituting the i.i.d. assumptions into the MAP estimate:
From part (b), the MAP estimate is:
θ^MAP=(Γ−1+ΦΣ−1ΦT)−1ΦΣ−1y
We are given that Γ=αI and Σ=σ2I. Let's substitute these into the MAP equation.
The inverses are Γ−1=α1I and Σ−1=σ21I.
θ^MAP=(α1I+Φ(σ21I)ΦT)−1Φ(σ21I)y
-
Simplifying the algebraic expression:
Pull out the scalar σ21 from the inverse term:
θ^MAP=[σ21(ασ2I+ΦΦT)]−1Φ(σ21I)y
Apply the property (cA)−1=c1A−1 where c is a scalar:
θ^MAP=σ2(ασ2I+ΦΦT)−1σ21Φy
The σ2 terms cancel out:
θ^MAP=(ΦΦT+ασ2I)−1Φy
By defining λ=ασ2, we get the desired form:
θ^MAP=(ΦΦT+λI)−1Φy
Since α (prior variance) and σ2 (noise variance) must be non-negative, λ≥0. This proves the first part.
-
Solving the Regularized Least-Squares Problem:
We want to show that the objective function in equation (3.49) leads to the same solution.
Let J(θ) be the objective function to minimize:
J(θ)=∥y−ΦTθ∥2+λ∥θ∥2
Expand the norms into vector dot products (∣∣x∣∣2=xTx):
J(θ)=(y−ΦTθ)T(y−ΦTθ)+λθTθ
J(θ)=yTy−yTΦTθ−θTΦy+θTΦΦTθ+λθTθ
Note that yTΦTθ=(θTΦy)T. Since the result is a scalar, it equals its transpose.
J(θ)=yTy−2θTΦy+θT(ΦΦT+λI)θ
-
Taking the derivative and setting to zero:
To minimize J(θ), we take the gradient with respect to the vector θ and set it to zero:
∇θJ(θ)=−2Φy+2(ΦΦT+λI)θ=0
(ΦΦT+λI)θ=Φy
Solving for θ:
θ^=(ΦΦT+λI)−1Φy
This is identical to equation (3.48), proving that the Bayesian MAP estimate with isotropic Gaussian priors is mathematically equivalent to solving the frequentist L2 regularized least-squares problem (Ridge regression).