Skip to main content

Answer

Prerequisites

  • Predictive Distribution
  • Expectation of Posterior (Posterior Mean)
  • Laplace Smoothing (Additive Smoothing)

Step-by-Step Derivation

  1. Formulate the Predictive Distribution: The predictive distribution for a new observation xx given the data D\mathcal{D} requires integrating over the unknown parameter π\pi using its posterior distribution. p(xD)=01p(xπ)p(πD)dπp(x|\mathcal{D}) = \int_0^1 p(x|\pi) p(\pi|\mathcal{D}) d\pi

  2. Substitute known equations:

    • p(xπ)=πx(1π)1xp(x|\pi) = \pi^x (1 - \pi)^{1-x} (Equation 3.30)
    • p(πD)=(n+1)!s!(ns)!πs(1π)nsp(\pi|\mathcal{D}) = \frac{(n+1)!}{s!(n-s)!} \pi^s (1 - \pi)^{n-s} (Equation 3.33)

    p(xD)=01(πx(1π)1x)((n+1)!s!(ns)!πs(1π)ns)dπp(x|\mathcal{D}) = \int_0^1 \left( \pi^x (1 - \pi)^{1-x} \right) \left( \frac{(n+1)!}{s!(n-s)!} \pi^s (1 - \pi)^{n-s} \right) d\pi p(xD)=(n+1)!s!(ns)!01πs+x(1π)ns+1xdπp(x|\mathcal{D}) = \frac{(n+1)!}{s!(n-s)!} \int_0^1 \pi^{s+x} (1 - \pi)^{n-s+1-x} d\pi

  3. Evaluate for x=1x=1: For x=1x=1, we want p(x=1D)p(x=1|\mathcal{D}). Substituting x=1x=1: p(x=1D)=(n+1)!s!(ns)!01πs+1(1π)nsdπp(x=1|\mathcal{D}) = \frac{(n+1)!}{s!(n-s)!} \int_0^1 \pi^{s+1} (1 - \pi)^{n-s} d\pi Applying the integration identity from Part (b) with m=s+1m = s+1 and n=nsn' = n-s: 01πs+1(1π)nsdπ=(s+1)!(ns)!((s+1)+(ns)+1)!=(s+1)!(ns)!(n+2)!\int_0^1 \pi^{s+1} (1 - \pi)^{n-s} d\pi = \frac{(s+1)!(n-s)!}{((s+1)+(n-s)+1)!} = \frac{(s+1)!(n-s)!}{(n+2)!} p(x=1D)=(n+1)!s!(ns)!(s+1)!(ns)!(n+2)!=(n+1)!(s+1)s!s!(n+2)(n+1)!=s+1n+2p(x=1|\mathcal{D}) = \frac{(n+1)!}{s!(n-s)!} \frac{(s+1)!(n-s)!}{(n+2)!} = \frac{(n+1)!(s+1)s!}{s!(n+2)(n+1)!} = \frac{s+1}{n+2}

  4. Evaluate for x=0x=0: We know p(x=0D)=1p(x=1D)p(x=0|\mathcal{D}) = 1 - p(x=1|\mathcal{D}), which is 1s+1n+21 - \frac{s+1}{n+2}. Alternative calculation: p(x=0D)=(n+1)!s!(ns)!01πs(1π)ns+1dπ=(n+1)!s!(ns)!s!(ns+1)!(n+2)!=ns+1n+2=1s+1n+2p(x=0|\mathcal{D}) = \frac{(n+1)!}{s!(n-s)!} \int_0^1 \pi^s (1 - \pi)^{n-s+1} d\pi = \frac{(n+1)!}{s!(n-s)!} \frac{s!(n-s+1)!}{(n+2)!} = \frac{n-s+1}{n+2} = 1 - \frac{s+1}{n+2}

  5. Combine into a single expression: Since xx can only be 0 or 1, we can write p(xD)p(x|\mathcal{D}) in the same functional form as the Bernoulli distribution: p(xD)=(s+1n+2)x(1s+1n+2)1xp(x|\mathcal{D}) = \left(\frac{s+1}{n+2}\right)^x \left(1 - \frac{s+1}{n+2}\right)^{1-x}

  6. Effective Bayesian Estimate and Virtual Samples: The effective Bayesian estimate for π\pi (p(x=1D)p(x=1|\mathcal{D})) is π^Bayes=s+1n+2\hat{\pi}_{Bayes} = \frac{s+1}{n+2}. The Maximum Likelihood Estimate (MLE) is π^MLE=sn\hat{\pi}_{MLE} = \frac{s}{n}.

    The intuitive explanation is that a uniform prior acts as if we have observed two virtual samples (or pseudo-counts) prior to our actual experiment: one "head" (+1+1 to the numerator ss) and one "tail" (a total of +2+2 to the denominator nn). This prevents the estimate of π\pi from being 0 or 1 when we have very little data. This concept is known as Laplace smoothing.