Skip to main content

Answer

Prerequisites

  • Properties of Covariance
  • Linearity of Integrals
  • Matrix Expansion

Step-by-Step Derivation

  1. Start with the geometric definition of covariance for the distribution p^(x)\hat{p}(x): Σ^=cov_p^(x)=p^(x)(xμ^)(xμ^)Tdx\hat{\Sigma} = \operatorname{cov}\_{\hat{p}}(x) = \int \hat{p}(x) (x - \hat{\mu})(x - \hat{\mu})^T dx

  2. Substitute the definition of p^(x)\hat{p}(x) from Equation (5.5): Σ^=(1n_i=1nk~(xxi))(xμ^)(xμ^)Tdx\hat{\Sigma} = \int \left( \frac{1}{n} \sum\_{i=1}^n \tilde{k}(x - x_i) \right) (x - \hat{\mu})(x - \hat{\mu})^T dx

  3. Rearrange the sum and integral via linearity: Σ^=1n_i=1nk~(xxi)(xμ^)(xμ^)Tdx\hat{\Sigma} = \frac{1}{n} \sum\_{i=1}^n \int \tilde{k}(x - x_i) (x - \hat{\mu})(x - \hat{\mu})^T dx

  4. We can strategically rewrite the term (xμ^)(x - \hat{\mu}) by adding and subtracting xix_i: xμ^=(xxi)+(xiμ^)x - \hat{\mu} = (x - x_i) + (x_i - \hat{\mu})

  5. Apply the change of variables u=xxiu = x - x_i, meaning du=dxdu = dx and xμ^=u+(xiμ^)x - \hat{\mu} = u + (x_i - \hat{\mu}): k~(xxi)(xμ^)(xμ^)Tdx=k~(u)(u+(xiμ^))(u+(xiμ^))Tdu\int \tilde{k}(x - x_i) (x - \hat{\mu})(x - \hat{\mu})^T dx = \int \tilde{k}(u) \Big(u + (x_i - \hat{\mu})\Big)\Big(u + (x_i - \hat{\mu})\Big)^T du

  6. Expand the quadratic term out: (u+(xiμ^))(uT+(xiμ^)T)\Big(u + (x_i - \hat{\mu})\Big)\Big(u^T + (x_i - \hat{\mu})^T\Big) =uuT+u(xiμ^)T+(xiμ^)uT+(xiμ^)(xiμ^)T= u u^T + u(x_i - \hat{\mu})^T + (x_i - \hat{\mu})u^T + (x_i - \hat{\mu})(x_i - \hat{\mu})^T

  7. Substitute this expansion back into the integral and separate into four distinct integrals: k~(u)uuTdu+k~(u)u(xiμ^)Tdu+k~(u)(xiμ^)uTdu+k~(u)(xiμ^)(xiμ^)Tdu\int \tilde{k}(u) u u^T du + \int \tilde{k}(u) u(x_i - \hat{\mu})^T du + \int \tilde{k}(u) (x_i - \hat{\mu}) u^T du + \int \tilde{k}(u) (x_i - \hat{\mu})(x_i - \hat{\mu})^T du

  8. Evaluate each of the four integral terms separately:

    • Term 1: By Equation (5.7) and given the mean of k~\tilde{k} is 0:
      k~(u)uuTdu=H\int \tilde{k}(u) u u^T du = H
    • Term 2: Since (xiμ^)T(x_i - \hat{\mu})^T is a constant respect to uu, we pull it out. From (5.6), k~(u)udu=0\int \tilde{k}(u) u du = 0:
      (k~(u)udu)(xiμ^)T=0(xiμ^)T=0\left(\int \tilde{k}(u) u du\right) (x_i - \hat{\mu})^T = 0 \cdot (x_i - \hat{\mu})^T = 0
    • Term 3: Similarly:
      (xiμ^)(k~(u)uTdu)=(xiμ^)0T=0(x_i - \hat{\mu}) \left(\int \tilde{k}(u) u^T du\right) = (x_i - \hat{\mu}) \cdot 0^T = 0
    • Term 4: Since k~(u)\tilde{k}(u) is a PDF, it integrates to 1:
      (xiμ^)(xiμ^)Tk~(u)du=(xiμ^)(xiμ^)T1=(xiμ^)(xiμ^)T(x_i - \hat{\mu})(x_i - \hat{\mu})^T \int \tilde{k}(u) du = (x_i - \hat{\mu})(x_i - \hat{\mu})^T \cdot 1 = (x_i - \hat{\mu})(x_i - \hat{\mu})^T
  9. Summing these terms, the integral inside the sum becomes: H+(xiμ^)(xiμ^)TH + (x_i - \hat{\mu})(x_i - \hat{\mu})^T

  10. Substitute this evaluated integral back into the overall sum from step 3: Σ^=1n_i=1n(H+(xiμ^)(xiμ^)T)\hat{\Sigma} = \frac{1}{n} \sum\_{i=1}^n \Big( H + (x_i - \hat{\mu})(x_i - \hat{\mu})^T \Big)

  11. Distribute the sum over the two terms: Σ^=1ni=1nH+1ni=1n(xiμ^)(xiμ^)T\hat{\Sigma} = \frac{1}{n} \sum*{i=1}^n H + \frac{1}{n} \sum*{i=1}^n (x_i - \hat{\mu})(x_i - \hat{\mu})^T

  12. Since HH does not depend on the index ii, summing it nn times and dividing by nn leaves exactly HH: Σ^=H+1n_i=1n(xiμ^)(xiμ^)T\hat{\Sigma} = H + \frac{1}{n} \sum\_{i=1}^n (x_i - \hat{\mu})(x_i - \hat{\mu})^T

This proves Equation (5.9).