Skip to main content

Answer

Pre-required Knowledge

  1. Linear Transformation of Gaussian: If xN(μ,Σ)x \sim \mathcal{N}(\mu, \Sigma), then y=Ax+by = Ax + b follows N(Aμ+b,AΣAT)\mathcal{N}(A\mu + b, A \Sigma A^T).
  2. Sum of Independent Gaussians: If XN(μX,σX2)X \sim \mathcal{N}(\mu_X, \sigma_X^2) and YN(μY,σY2)Y \sim \mathcal{N}(\mu_Y, \sigma_Y^2) are independent, then Z=X+YN(μX+μY,σX2+σY2)Z = X + Y \sim \mathcal{N}(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2).
  3. Marginalization: p(yf)p(f)df\int p(y|f) p(f) df.

Step-by-Step Answer

Part 1: Distribution of ff_*

  1. Define ff_*: The latent function value is defined as a linear transformation of the parameters:

    f=ϕ(x)Tθf_* = \phi(x_*)^T \theta
  2. Apply Linear Transformation Property: We know the posterior of θ\theta is p(θD)=N(θμ^θ,Σ^θ)p(\theta|\mathcal{D}) = \mathcal{N}(\theta | \hat{\mu}_\theta, \hat{\Sigma}_\theta). Using the linear transformation property (where A=ϕ(x)TA = \phi(x_*)^T is a row vector):

    • Mean: E[f]=ϕ(x)TE[θ]=ϕ(x)Tμ^θ\mathbb{E}[f_*] = \phi(x_*)^T \mathbb{E}[\theta] = \phi(x_*)^T \hat{\mu}_\theta
    • Variance: Var[f]=ϕ(x)TCov[θ]ϕ(x)=ϕ(x)TΣ^θϕ(x)\operatorname{Var}[f_*] = \phi(x_*)^T \operatorname{Cov}[\theta] \phi(x_*) = \phi(x_*)^T \hat{\Sigma}_\theta \phi(x_*)
  3. Result:

    p(fx,D)=N(fμ^,σ^2)p(f_* | x_*, \mathcal{D}) = \mathcal{N}(f_* | \hat{\mu}_*, \hat{\sigma}_*^2)

    where μ^\hat{\mu}_* and σ^2\hat{\sigma}_*^2 match the equations (3.51) and (3.52).

Part 2: Distribution of yy_*

  1. Model Relationship: The observed output is the function value plus noise:

    y=f+ϵ,ϵN(0,σ2)y_* = f_* + \epsilon_*, \quad \epsilon_* \sim \mathcal{N}(0, \sigma^2)
  2. Sum of Independent Random Variables: We have the distribution of ff_* (from Part 1) and the distribution of ϵ\epsilon_* (noise assumption). Since the new noise ϵ\epsilon_* is independent of the past data D\mathcal{D} (and thus ff_*), the variable yy_* is the sum of two independent Gaussian variables.

  3. Compute Moments:

    • Mean: E[y]=E[f]+E[ϵ]=μ^+0=μ^\mathbb{E}[y_*] = \mathbb{E}[f_*] + \mathbb{E}[\epsilon_*] = \hat{\mu}_* + 0 = \hat{\mu}_*
    • Variance: Var[y]=Var[f]+Var[ϵ]=σ^2+σ2\operatorname{Var}[y_*] = \operatorname{Var}[f_*] + \operatorname{Var}[\epsilon_*] = \hat{\sigma}_*^2 + \sigma^2
  4. Result:

    p(yx,D)=N(yμ^,σ^2+σ2)p(y_*|x_*, \mathcal{D}) = \mathcal{N}(y_* | \hat{\mu}_*, \hat{\sigma}_*^2 + \sigma^2)

    This matches equation (3.53).