-
Define the Kernel Density Estimator:
The standard kernel density estimator (KDE) is defined as:
p^(x)=n1i=1∑nhd1k(hx−xi)
Here, we introduce the rescaled kernel function k~(z)=hd1k(hz), which simplifies the KDE expression to:
p^(x)=n1i=1∑nk~(x−xi)
-
Take the Expectation:
We want to find the expected value of our estimator p^(x) over the random sample dataset X. We apply the expectation operator EX:
EX[p^(x)]=EX[n1i=1∑nk~(x−xi)]
-
Apply Linearity of Expectation:
Since the expectation of a sum is the sum of expectations, we can move E inside the summation:
EX[p^(x)]=n1i=1∑nExi∼p[k~(x−xi)]
Because all samples x1,…,xn are drawn independently and identically distributed (i.i.d.) from the true density p(x), the expected value E[k~(x−xi)] is identical for every i. Let's use a dummy variable μ to represent any sample drawn from p:
EX[p^(x)]=nnEμ∼p[k~(x−μ)]=Eμ∼p[k~(x−μ)]
-
Express Expectation as an Integral:
By the definition of the expected value for a continuous random variable μ∼p(μ), the expectation of a function f(μ) is ∫f(μ)p(μ)dμ. Applying this to our function f(μ)=k~(x−μ):
∫f(μ)p(μ)dμ=∫k~(x−μ)p(μ)dμ=∫p(μ)k~(x−μ)dμ
Thus, EX[p^(x)]=∫p(μ)k~(x−μ)dμ.
-
Recognize the Convolution:
The integral of the product of two functions where one is shifted and reversed, ∫f(μ)g(x−μ)dμ, is exactly the definition of the convolution (f∗g)(x). Thus:
∫p(μ)k~(x−μ)dμ=(p∗k~)(x)=p(x)∗k~(x)
This proves Equation (5.1).
-
Conclusion on Bias:
The bias of an estimator is defined as Bias(p^(x))=E[p^(x)]−p(x).
Thus, the bias of the KDE is p(x)∗k~(x)−p(x). This tells us that the KDE is generally a biased estimator. Its expected value is not exactly the true density p(x), but rather a smoothed (convolved) version of p(x), smeared out by the kernel function k~.