Skip to main content

Explain

Intuition

The conclusion from part (b) — that the model's variance is exactly the original data's variance plus HH — reveals a profound reality about Kernel methods.

It tells us that Kernel Density Estimation (KDE) strictly inflates the variance of your modeled distribution in comparison to your raw data. Changing the kernel bandwidth (which modifies HH) is equivalent to controlling a "blur" slider on an image editing tool.

The Bias connection

When designing a statistical estimator p^(x)\hat{p}(x), we are constantly fighting a war on two fronts: Variance (instability due to dataset randomness) and Bias (systematic inaccuracy).

  • Turn the "Blur" up (Large HH): We get a smooth, steady, aesthetically pleasing curve. It has low variance (the curve won't drastically change if we sample slightly different data points). However, because we forcibly widened the distribution by adding HH, the curve no longer tightly traces the peaks and drops of the real phenomenon. Because we systematically force the model to be excessively flat, we introduce a massive Bias (an error between our expected smooth curve and the spiky reality).
  • Turn the "Blur" down (Small HH): We faithfully paint a probability mass exactly where the data is, eliminating the HH spread. Our systematic Bias drops to near zero. But your resulting curve will look like a chaotic skyline of sharp needles. It simply memorized the dataset instead of generalizing it (meaning it has exceptionally high variance).

The extra HH term in our covariance tells us: "This is the penalty you pay to get a smooth curve." We trade the accuracy of predicting perfectly sharp local details (Bias) for the global stability of a continuous probability density curve.