-
Formulate the Predictive Distribution: The predictive distribution for a new observation x given the data D requires integrating over the unknown parameter π using its posterior distribution.
p(x∣D)=∫01p(x∣π)p(π∣D)dπ
-
Substitute known equations:
- p(x∣π)=πx(1−π)1−x (Equation 3.30)
- p(π∣D)=s!(n−s)!(n+1)!πs(1−π)n−s (Equation 3.33)
p(x∣D)=∫01(πx(1−π)1−x)(s!(n−s)!(n+1)!πs(1−π)n−s)dπ
p(x∣D)=s!(n−s)!(n+1)!∫01πs+x(1−π)n−s+1−xdπ
-
Evaluate for x=1:
For x=1, we want p(x=1∣D). Substituting x=1:
p(x=1∣D)=s!(n−s)!(n+1)!∫01πs+1(1−π)n−sdπ
Applying the integration identity from Part (b) with m=s+1 and n′=n−s:
∫01πs+1(1−π)n−sdπ=((s+1)+(n−s)+1)!(s+1)!(n−s)!=(n+2)!(s+1)!(n−s)!
p(x=1∣D)=s!(n−s)!(n+1)!(n+2)!(s+1)!(n−s)!=s!(n+2)(n+1)!(n+1)!(s+1)s!=n+2s+1
-
Evaluate for x=0:
We know p(x=0∣D)=1−p(x=1∣D), which is 1−n+2s+1.
Alternative calculation:
p(x=0∣D)=s!(n−s)!(n+1)!∫01πs(1−π)n−s+1dπ=s!(n−s)!(n+1)!(n+2)!s!(n−s+1)!=n+2n−s+1=1−n+2s+1
-
Combine into a single expression:
Since x can only be 0 or 1, we can write p(x∣D) in the same functional form as the Bernoulli distribution:
p(x∣D)=(n+2s+1)x(1−n+2s+1)1−x
-
Effective Bayesian Estimate and Virtual Samples:
The effective Bayesian estimate for π (p(x=1∣D)) is π^Bayes=n+2s+1.
The Maximum Likelihood Estimate (MLE) is π^MLE=ns.
The intuitive explanation is that a uniform prior acts as if we have observed two virtual samples (or pseudo-counts) prior to our actual experiment: one "head" (+1 to the numerator s) and one "tail" (a total of +2 to the denominator n). This prevents the estimate of π from being 0 or 1 when we have very little data. This concept is known as Laplace smoothing.