Skip to main content

Problem 3.8(a) Explanation

The Concept of Likelihood

The function p(Dπ)p(\mathcal{D}|\pi) is known as the likelihood function. It tells us how likely the observed data D\mathcal{D} is for a specific value of the parameter π\pi.

  • If we were observing a single coin flip (n=1n=1), the likelihood would simply be π\pi if heads (x=1x=1) and 1π1-\pi if tails (x=0x=0).
  • When we observe nn repeated independent flips, the combined probability is the product of the individual probabilities.

Derivation Logic

The core of the derivation relies on counting outcomes. Since a Bernoulli variable xix_i can only be 0 or 1:

  • When xi=1x_i = 1, the term contributes a factor of π\pi.
  • When xi=0x_i = 0, the term contributes a factor of (1π)(1-\pi).

Therefore, if we observe the sequence 1,0,1,1,01, 0, 1, 1, 0:

  • We have three 11's and two 00's.
  • The probability is π(1π)ππ(1π)=π3(1π)2\pi \cdot (1-\pi) \cdot \pi \cdot \pi \cdot (1-\pi) = \pi^3 (1-\pi)^2.
  • Here, n=5n=5 and the sum s=1+0+1+1+0=3s = 1+0+1+1+0 = 3.
  • The number of 00's is ns=53=2n-s = 5-3 = 2.
  • The formula πs(1π)ns\pi^s (1-\pi)^{n-s} generalizes this counting process.

Sufficient Statistic

The quantity s=xis = \sum x_i is called a sufficient statistic for π\pi. This means that ss contains all the information in the data D\mathcal{D} that is relevant for estimating π\pi. Knowing the exact order of heads and tails (e.g., whether we got HHT or HTH) does not change our estimate of the bias π\pi; only the total number of heads (ss) and total number of flips (nn) matter.

The result p(Dπ)=πs(1π)nsp(\mathcal{D}|\pi) = \pi^s (1-\pi)^{n-s} basically says the likelihood depends on the data only through ss.