-
Start with the geometric definition of covariance for the distribution p^(x):
Σ^=cov_p^(x)=∫p^(x)(x−μ^)(x−μ^)Tdx
-
Substitute the definition of p^(x) from Equation (5.5):
Σ^=∫(n1∑_i=1nk~(x−xi))(x−μ^)(x−μ^)Tdx
-
Rearrange the sum and integral via linearity:
Σ^=n1∑_i=1n∫k~(x−xi)(x−μ^)(x−μ^)Tdx
-
We can strategically rewrite the term (x−μ^) by adding and subtracting xi:
x−μ^=(x−xi)+(xi−μ^)
-
Apply the change of variables u=x−xi, meaning du=dx and x−μ^=u+(xi−μ^):
∫k~(x−xi)(x−μ^)(x−μ^)Tdx=∫k~(u)(u+(xi−μ^))(u+(xi−μ^))Tdu
-
Expand the quadratic term out:
(u+(xi−μ^))(uT+(xi−μ^)T)
=uuT+u(xi−μ^)T+(xi−μ^)uT+(xi−μ^)(xi−μ^)T
-
Substitute this expansion back into the integral and separate into four distinct integrals:
∫k~(u)uuTdu+∫k~(u)u(xi−μ^)Tdu+∫k~(u)(xi−μ^)uTdu+∫k~(u)(xi−μ^)(xi−μ^)Tdu
-
Evaluate each of the four integral terms separately:
- Term 1: By Equation (5.7) and given the mean of k~ is 0:
∫k~(u)uuTdu=H
- Term 2: Since (xi−μ^)T is a constant respect to u, we pull it out. From (5.6), ∫k~(u)udu=0:
(∫k~(u)udu)(xi−μ^)T=0⋅(xi−μ^)T=0
- Term 3: Similarly:
(xi−μ^)(∫k~(u)uTdu)=(xi−μ^)⋅0T=0
- Term 4: Since k~(u) is a PDF, it integrates to 1:
(xi−μ^)(xi−μ^)T∫k~(u)du=(xi−μ^)(xi−μ^)T⋅1=(xi−μ^)(xi−μ^)T
-
Summing these terms, the integral inside the sum becomes:
H+(xi−μ^)(xi−μ^)T
-
Substitute this evaluated integral back into the overall sum from step 3:
Σ^=n1∑_i=1n(H+(xi−μ^)(xi−μ^)T)
-
Distribute the sum over the two terms:
Σ^=n1∑∗i=1nH+n1∑∗i=1n(xi−μ^)(xi−μ^)T
-
Since H does not depend on the index i, summing it n times and dividing by n leaves exactly H:
Σ^=H+n1∑_i=1n(xi−μ^)(xi−μ^)T
This proves Equation (5.9).