Skip to main content

Problem 3.8(c) Explanation

Why integrate?

In Bayesian prediction, we don't just pick one "best" value of π\pi and predict using that (which is what we do in MLE or MAP). Instead, we consider all possible values of π\pi, weighted by how likely they are given the data (the posterior).

P(HeadsData)=P(Headsπ)P(πData)dπP(\text{Heads}|\text{Data}) = \int P(\text{Heads}|\pi) P(\pi|\text{Data}) d\pi

If the posterior is sharp around 0.7, then values near 0.7 dominate this integral. If the posterior is broad (high uncertainty), the integral averages out the predictions from many different π\pi's.

Laplace Smoothing

The result s+1n+2\frac{s+1}{n+2} is historically famous as Laplace's Rule of Succession. Imagine you see the sun rise for nn days in a row (s=ns=n).

  • MLE says probability of sun rising tomorrow is n/n=1n/n = 1 (100% certainty). This is risky; just because it happened before doesn't logically guarantee it will happen forever.
  • Bayesian estimate with uniform prior says n+1n+2\frac{n+1}{n+2}. It's very close to 1 for large nn, but never exactly 1. It leaves a tiny probability for the "black swan" event.

Connection to Pseudocounts

The parameters of the Beta prior, α\alpha and β\beta, can be directly interpreted as pseudocounts.

  • Uniform prior: Beta(1, 1).
    • Effective Successes s=s+(α1)=s+0s' = s + (\alpha - 1) = s + 0 ... Wait, the pseudocounts match the parameters α,β\alpha, \beta directly?
    • Let's check the mean of Beta(α,β\alpha, \beta). It is αα+β\frac{\alpha}{\alpha+\beta}.
    • Posterior is Beta(s+1,ns+1s+1, n-s+1).
    • Mean is s+1s+1+ns+1=s+1n+2\frac{s+1}{s+1 + n-s+1} = \frac{s+1}{n+2}.
    • This is consistent with starting with α=1,β=1\alpha=1, \beta=1.
    • "Virtual samples" count = α+β=2\alpha + \beta = 2.
    • Virtual successes = α=1\alpha = 1.
    • Virtual failures = β=1\beta = 1.
    • So yes, prior accounts for 1 success and 1 failure.