Skip to main content

Answer

Step-by-Step Answer

  1. Properties of p^(x)\hat{p}(x) deduced from (b): The result Σ^=S+H\hat{\Sigma} = S + H (where SS is sample covariance) tells us that the Kernel Density Estimate always has a larger variance than the underlying sample data. The KDE "over-smooths" or spreads out the data. The amount of extra spread is exactly determined by the bandwidth matrix HH.

  2. Relation to Bias: Bias in density estimation refers to E[p^(x)]p(x)\mathbb{E}[\hat{p}(x)] - p(x) (where the expectation is over datasets). However, the question likely refers to the "bias" in the variance estimate or the smoothing bias. Because the variance is inflated by HH, the estimate is "biased" towards being flatter and wider than the true distribution (assuming the sample covariance is a good estimate of the true covariance).

    Specifically, if the true distribution p(x)p(x) has covariance Σ\Sigma, and SΣS \approx \Sigma, then covp^(x)Σ+H\text{cov}_{\hat{p}}(x) \approx \Sigma + H.

    • If HH is large (large bandwidth), the variance is much larger than the true variance (high bias, low variance of the estimator itself).
    • If HH is small, we approach the sample variance (low bias, high variance of the estimator).

    The term HH represents the smoothing bias introduced to obtain a continuous density. This smoothing reduces the variance of the estimated density function values at the cost of biasing the structure (moments) of the distribution, specifically inflating the second moment.

    In the context of the estimator of the mean, it is unbiased (as seen in part a). In the context of the covariance, it is biased upwards by HH.