Skip to main content

Problem 3.8(a) Answer

Pre-required Knowledge

  1. Bernoulli Distribution: A discrete probability distribution of a random variable which takes the value 1 with probability π\pi and the value 0 with probability 1π1-\pi. The probability mass function is given by: p(xπ)=πx(1π)1xp(x|\pi) = \pi^x (1-\pi)^{1-x} for x{0,1}x \in \{0, 1\}.

  2. Independent and Identically Distributed (i.i.d.): We assume the samples in the dataset D\mathcal{D} are drawn independently from the same distribution. If events AA and BB are independent, then P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B). Generally, for independent samples x1,,xnx_1, \dots, x_n, the joint probability is the product of individual probabilities: p(x1,,xnπ)=i=1np(xiπ)p(x_1, \dots, x_n | \pi) = \prod_{i=1}^n p(x_i | \pi)

  3. Exponent Rules:

    • abac=ab+ca^b \cdot a^c = a^{b+c}
    • (ab)c=acbc(ab)^c = a^c b^c

Step-by-Step Proof

  1. Write down the likelihood of the dataset D\mathcal{D}: Assuming the samples D={x1,,xn}\mathcal{D} = \{x_1, \dots, x_n\} are i.i.d., the probability of observing the dataset given the parameter π\pi is the product of the probabilities of each individual sample.

    p(Dπ)=i=1np(xiπ)p(\mathcal{D}|\pi) = \prod_{i=1}^n p(x_i|\pi)
  2. Substitute the Bernoulli PDF: Substitute Eq. (3.30) (p(xπ)=πx(1π)1xp(x|\pi) = \pi^x(1-\pi)^{1-x}) into the product.

    p(Dπ)=i=1n[πxi(1π)1xi]p(\mathcal{D}|\pi) = \prod_{i=1}^n \left[ \pi^{x_i} (1-\pi)^{1-x_i} \right]
  3. Group the terms: Using the properties of exponents, we can separate the π\pi terms and the (1π)(1-\pi) terms.

    p(Dπ)=(i=1nπxi)(i=1n(1π)1xi)p(\mathcal{D}|\pi) = \left( \prod_{i=1}^n \pi^{x_i} \right) \cdot \left( \prod_{i=1}^n (1-\pi)^{1-x_i} \right)
  4. Apply product rule for exponents: Recall that abi=abi\prod a^{b_i} = a^{\sum b_i}.

    p(Dπ)=πi=1nxi(1π)i=1n(1xi)p(\mathcal{D}|\pi) = \pi^{\sum_{i=1}^n x_i} \cdot (1-\pi)^{\sum_{i=1}^n (1-x_i)}
  5. Simplify the exponents: Let s=i=1nxis = \sum_{i=1}^n x_i. This is the sum of the samples (number of successes/heads). The exponent for the second term is:

    i=1n(1xi)=i=1n1i=1nxi=ns\sum_{i=1}^n (1-x_i) = \sum_{i=1}^n 1 - \sum_{i=1}^n x_i = n - s

    Here nn is the total number of samples.

  6. Final Result: Substitute ss and nsn-s back into the equation.

    p(Dπ)=πs(1π)nsp(\mathcal{D}|\pi) = \pi^s (1-\pi)^{n-s}

    This matches Eq. (3.31).