Answer
Prerequisites
- Kernel Density Estimation (KDE)
- Bias-Variance Tradeoff
- Properties of Moments in Statistics
Description
From the derivations in parts (a) and (b), we identified two main moments of the geometric estimate :
- The mean of the estimated distribution perfectly matches the empirical mean of the sample data.
- The covariance of the estimated distribution equals the sample empirical covariance plus (the internal covariance/bandwidth of the kernel function itself).
1. Properties of the Kernel Density Estimate
This mathematical fact physically tells us that the Kernel Density Estimate exhibits Over-smoothing / Dispersion. Because is a positive semi-definite covariance matrix, . The resulting probability distribution is consistently wider and more dispersed than the empirical distribution formed strictly by the sample points . The kernel injects its own "structural spread" into the final representation of the data.
2. Relation to the Bias of the Kernel Density Estimator
In density estimation, Bias measures how far the expected value of our estimator is from the true underlying distribution that generated the data.
Because we add to the variance:
- High (Large Bandwidth): The kernel strongly smooths the data. Sharp peaks in the true probability distribution are artificially flattened, and deep valleys are lifted. By making the distribution wider, we systematically miss the localized features of the true density. This structural, systematic misrepresentation is precisely what constitutes a large bias. We are confidently modeling the wrong, overly-flattened shape.
- Low (Small Bandwidth): As , the additional variance penalty shrinks, keeping local details intact (reducing the bias). However, this makes the density estimate overly sensitive to the exact locations of individual data points, causing extreme spikiness and leading to a high variance in the estimator itself.
Therefore, the presence of the term in the modeled covariance perfectly characterizes the fundamental smoothing bias of KDE. The magnitude of this bias directly scales with the size of the kernel bandwidth .