Explain
Explanation of KDE Mean and Bias
When we use Kernel Density Estimation (KDE), we are placing a small "bump" (the kernel function) on top of each data point we observed. Summing these bumps gives us the estimated density function.
Step-by-Step Derivation Breakdown:
- Expectation: We want to find the "average" shape of our estimated curve if we were to repeat the experiment many times.
- Linearity: Since the estimator is just an average of kernels placed at each point, the expected value of the estimator is the average of the expected value of a single kernel.
- Integral: The expected value of a single kernel centered at a random data point is calculated by integrating the kernel value weighted by the probability of the data point falling at , which is .
- Convolution Result: This integral is mathematically a convolution.
What does this mean visually?
Imagine the true distribution is a sharp spike. The expected estimated distribution is that sharp spike convolved with the kernel width. If the kernel is a Gaussian with width , the expected estimate will be the true spike blurred by that Gaussian.
- Bias: The difference between the expected estimate () and the true .
- Because of this smoothing (convolution), sharp peaks in the true density are underestimated (flattened), and valleys are overestimated (filled in).
- This proves that KDE is inherently biased for any finite bandwidth . We only get the "truth" if we make the kernel infinitely narrow () and have infinite data ().