Skip to main content

Explain

Explanation of KDE Variance Bound

The variance of an estimator tells us how much the estimate fluctuates around its average value across different random datasets.

Key Insight from the Derivation:

var(p^(x))Cnhd\text{var}(\hat{p}(x)) \le \frac{C}{nh^d}

where CC depends on the kernel maximum and the density itself.

  1. 1/n1/n factor: As we get more data points (nn increases), the variance decreases. This is standard for most statistical estimators; more data means more stability.
  2. 1/hd1/h^d factor: As the bandwidth hh gets smaller, the variance increases.
    • Think of it this way: if hh is very tiny, the density estimate at xx depends only on data points falling extremely close to xx. This is a rare event, so the count will fluctuate wildly (0, 1, or 2 points) between different datasets, leading to high variance.
    • If hh is large, we average over a large region, stabilizing the count and reducing variance.

Bias-Variance Tradeoff:

  • Part (a) (Bias): Small hh reduces bias (less smoothing).
  • Part (b) (Variance): Small hh increases variance (noisier).

This implies we need to tune hh carefully. We want h0h \to 0 as nn \to \infty to eliminate bias, but we need nhdnh^d \to \infty to eliminate variance. This means hh must shrink, but not too fast relative to the sample size nn.