Skip to main content

Explain

Intuition

Think of Kernel Density Estimation (KDE) as placing a small pile of sand (representing a unit of probability mass) directly on top of every data point we observe. The shape and width of this sand pile are controlled by our chosen kernel function k~(x)\tilde{k}(x).

Because our kernel function is perfectly balanced (it has a zero mean, meaning it doesn't lean left or right), the "center of mass" for each individual sand pile rests precisely on the data point xix_i where we placed it.

When we calculate the overall mean (or expected value) of our entire estimated distribution p^(x)\hat{p}(x), we are essentially trying to find the joint "center of mass" of all these combined sand piles.

Here is the key insight:

  • Because every single sand pile has exactly the same weight (each contributes 1/n1/n of the total probability).
  • And because each sand pile is perfectly centered on its respective data point (xix_i).

The overall center of mass perfectly aligns with the simple arithmetic average of all the original data points (which is the sample mean). No matter how wide or narrow you make the individual sand piles (i.e. changing the kernel bandwidth), as long as they remain symmetrically centered, they will not shift the overall average.

Diagram: Kernel Sand Piles

graph TD
subgraph Data Points
x1(("x₁"))
x2(("x₂"))
x3(("x₃"))
end

subgraph Kernel Contributions
k1["Centered on x₁\nMean: x₁"]
k2["Centered on x₂\nMean: x₂"]
k3["Centered on x₃\nMean: x₃"]
end

x1 -->|Place Kernel| k1
x2 -->|Place Kernel| k2
x3 -->|Place Kernel| k3

subgraph Overall Distribution p̂(x)
Average["Overall Mean = (x₁ + x₂ + x₃) / 3"]
end

k1 --> Average
k2 --> Average
k3 --> Average