Skip to main content

Answer (b)

Prerequisites

  • Variance of a Sum (Independent Variables)
  • Kernel Function Properties

Step-by-Step Derivation

  1. Express the KDE: The KDE is defined as p^(x)=1ni=1n1hdk(xxih)\hat{p}(x) = \frac{1}{n}\sum_{i=1}^n \frac{1}{h^d} k\left(\frac{x - x_i}{h}\right).

  2. Apply Variance to the KDE: We want to compute the variance of p^(x)\hat{p}(x) with respect to the random sample X={x1,,xn}X = \{x_1, \dots, x_n\}.

    varX(p^(x))=varX(1nhdi=1nk(xxih))\mathrm{var}_X(\hat{p}(x)) = \mathrm{var}_X \left( \frac{1}{n h^d} \sum_{i=1}^n k\left(\frac{x - x_i}{h}\right) \right)
  3. Use Variance Properties for Independent Samples: Since the samples x1,,xnx_1, \dots, x_n are drawn independently and identically distributed (i.i.d.), the variance of their sum is the sum of their variances: var(Zi)=var(Zi)\mathrm{var}(\sum Z_i) = \sum \mathrm{var}(Z_i). Also, extracting a constant cc gives var(cZ)=c2var(Z)\mathrm{var}(c Z) = c^2 \mathrm{var}(Z).

    varX(p^(x))=1n2h2di=1nvarxi(k(xxih))\mathrm{var}_X(\hat{p}(x)) = \frac{1}{n^2 h^{2d}} \sum_{i=1}^n \mathrm{var}_{x_i} \left( k\left(\frac{x - x_i}{h}\right) \right)

    Since each xipx_i \sim p identically, all terms in the sum are equal. Let μp\mu \sim p be a dummy variable:

    varX(p^(x))=nn2h2dvarμp(k(xμh))=1nh2dvarμ(k(xμh))\mathrm{var}_X(\hat{p}(x)) = \frac{n}{n^2 h^{2d}} \mathrm{var}_{\mu \sim p} \left( k\left(\frac{x - \mu}{h}\right) \right) = \frac{1}{n h^{2d}} \mathrm{var}_{\mu} \left( k\left(\frac{x - \mu}{h}\right) \right)
  4. Apply the Variance Upper Bound (Eq 5.3): Using the hint var(Z)E[Z2]\mathrm{var}(Z) \le \mathbb{E}[Z^2]:

    varμ(k(xμh))Eμ[(k(xμh))2]\mathrm{var}_{\mu} \left( k\left(\frac{x - \mu}{h}\right) \right) \le \mathbb{E}_{\mu} \left[ \left( k\left(\frac{x - \mu}{h}\right) \right)^2 \right]

    So we get:

    varX(p^(x))1nh2dEμ[(k(xμh))2]\mathrm{var}_X(\hat{p}(x)) \le \frac{1}{n h^{2d}} \mathbb{E}_{\mu} \left[ \left( k\left(\frac{x - \mu}{h}\right) \right)^2 \right]
  5. Apply the Kernel Maximum Bound (Eq 5.4): We can split the squared kernel into two terms and bound one of them using the maximum value of the kernel function. Let maxzk(z)\max_z k(z) represent the maximum output of the kernel (equivalent to maxx(k(x))\max_x(k(x)) in the problem notation).

    (k(xμh))2=k(xμh)k(xμh)maxz(k(z))k(xμh)\left( k\left(\frac{x - \mu}{h}\right) \right)^2 = k\left(\frac{x - \mu}{h}\right) \cdot k\left(\frac{x - \mu}{h}\right) \le \max_z(k(z)) \cdot k\left(\frac{x - \mu}{h}\right)

    Substitute this inequality into the expectation:

    Eμ[(k(xμh))2]Eμ[maxz(k(z))k(xμh)]=maxz(k(z))Eμ[k(xμh)]\mathbb{E}_{\mu} \left[\left( k\left(\frac{x - \mu}{h}\right) \right)^2\right] \le \mathbb{E}_{\mu} \left[ \max_z(k(z)) \cdot k\left(\frac{x - \mu}{h}\right) \right] = \max_z(k(z)) \cdot \mathbb{E}_{\mu} \left[ k\left(\frac{x - \mu}{h}\right) \right]

    Plugging this back into the variance inequality:

    varX(p^(x))1nh2dmaxz(k(z))Eμ[k(xμh)]\mathrm{var}_X(\hat{p}(x)) \le \frac{1}{n h^{2d}} \max_z(k(z)) \mathbb{E}_{\mu} \left[ k\left(\frac{x - \mu}{h}\right) \right]
  6. Relate Back to Expected Density E[p^(x)]\mathbb{E}[\hat{p}(x)]: Rearrange the right side to recover the form of E[p^(x)]\mathbb{E}[\hat{p}(x)] derived in Part (a). We know k~(z)=1hdk(zh)\tilde{k}(z) = \frac{1}{h^d} k(\frac{z}{h}), so:

    varX(p^(x))1nhdmaxz(k(z))Eμ[1hdk(xμh)]\mathrm{var}_X(\hat{p}(x)) \le \frac{1}{n h^d} \max_z(k(z)) \cdot \mathbb{E}_{\mu} \left[ \frac{1}{h^d} k\left(\frac{x - \mu}{h}\right) \right]

    We proved in (a) that Eμ[1hdk(xμh)]=EX[p^(x)]\mathbb{E}_{\mu} \left[ \frac{1}{h^d} k\left(\frac{x - \mu}{h}\right) \right] = \mathbb{E}_X[\hat{p}(x)]. Replacing this chunk:

    varX(p^(x))1nhdmaxz(k(z))EX[p^(x)]\mathrm{var}_X(\hat{p}(x)) \le \frac{1}{nh^d} \max_z(k(z)) \mathbb{E}_X[\hat{p}(x)]

    Using the problem's notation maxx(k(x))\max_x(k(x)) for the kernel maximum, we obtain:

    varX(p^(x))1nhdmaxx(k(x))E[p^(x)]\mathrm{var}_X(\hat{p}(x)) \le \frac{1}{nh^d}\max_x(k(x))\mathbb{E}[\hat{p}(x)]