Skip to main content

Answer

Prerequisites

  • Maximum A Posteriori (MAP) Estimation
  • Beta Distribution parameters
  • Derivative of Log-Posterior

Step-by-Step Derivation

  1. Analyze the Priors as Beta Distributions: The Beta distribution is given by Beta(π;α,β)πα1(1π)β1Beta(\pi; \alpha, \beta) \propto \pi^{\alpha-1}(1-\pi)^{\beta-1}.

    • For p1(π)=2ππ1(1π)0p_1(\pi) = 2\pi \propto \pi^1 (1-\pi)^0, this matches a Beta distribution with α=2\alpha=2, β=1\beta=1.
    • For p0(π)=22π=2(1π)π0(1π)1p_0(\pi) = 2-2\pi = 2(1-\pi) \propto \pi^0 (1-\pi)^1, this matches a Beta distribution with α=1\alpha=1, β=2\beta=2.
  2. Calculate MAP Estimate for p1(π)=2πp_1(\pi) = 2\pi: The posterior is p1(πD)p(Dπ)p1(π)=[πs(1π)ns]π=πs+1(1π)nsp_1(\pi|\mathcal{D}) \propto p(\mathcal{D}|\pi) p_1(\pi) = \left[ \pi^s (1-\pi)^{n-s} \right] \pi = \pi^{s+1}(1-\pi)^{n-s}. Take log-posterior: logp1(πD)=(s+1)logπ+(ns)log(1π)+C\log p_1(\pi|\mathcal{D}) = (s+1)\log\pi + (n-s)\log(1-\pi) + C. Differentiate and set to 0: s+1πns1π=0\frac{s+1}{\pi} - \frac{n-s}{1-\pi} = 0 (s+1)(1π)=(ns)π    s+1(s+1)π=nπsπ    (s+1)=(n+1)π(s+1)(1-\pi) = (n-s)\pi \implies s+1 - (s+1)\pi = n\pi - s\pi \implies (s+1) = (n+1)\pi π^MAP,1=s+1n+1\hat{\pi}_{MAP, 1} = \frac{s+1}{n+1}

  3. Calculate MAP Estimate for p0(π)=2(1π)p_0(\pi) = 2(1-\pi): The posterior is p0(πD)p(Dπ)p0(π)[πs(1π)ns](1π)=πs(1π)ns+1p_0(\pi|\mathcal{D}) \propto p(\mathcal{D}|\pi) p_0(\pi) \propto \left[ \pi^s (1-\pi)^{n-s} \right] (1-\pi) = \pi^s(1-\pi)^{n-s+1}. Take log-posterior: logp0(πD)=slogπ+(ns+1)log(1π)+C\log p_0(\pi|\mathcal{D}) = s\log\pi + (n-s+1)\log(1-\pi) + C. Differentiate and set to 0: sπns+11π=0\frac{s}{\pi} - \frac{n-s+1}{1-\pi} = 0 s(1π)=(ns+1)π    s=(n+1)πs(1-\pi) = (n-s+1)\pi \implies s = (n+1)\pi π^MAP,0=sn+1\hat{\pi}_{MAP, 0} = \frac{s}{n+1}

  4. Effective Bayesian Estimates (Predictive Mean): Using the posterior distributions, which are Beta(s+2,ns+1)Beta(s+2, n-s+1) for p1p_1 and Beta(s+1,ns+2)Beta(s+1, n-s+2) for p0p_0. The mean of Beta(α,β)Beta(\alpha, \beta) is αα+β\frac{\alpha}{\alpha+\beta}.

    • For p1p_1: π^Bayes,1=s+2(s+2)+(ns+1)=s+2n+3\hat{\pi}_{Bayes, 1} = \frac{s+2}{(s+2) + (n-s+1)} = \frac{s+2}{n+3}
    • For p0p_0: π^Bayes,0=s+1(s+1)+(ns+2)=s+1n+3\hat{\pi}_{Bayes, 0} = \frac{s+1}{(s+1) + (n-s+2)} = \frac{s+1}{n+3}
  5. Intuitive Explanation (Virtual Samples):

    • Prior p1(π)=2πp_1(\pi) = 2\pi (Beta(2,1)): The mathematical form π1(1π)0\pi^1(1-\pi)^0 is equivalent to having observed 1 virtual "head" prior to the experiment. Hence, the MAP adds 1 to the numerator (heads) and 1 to the denominator (total flips). The Bayesian expectation adds 2 total virtual samples (1 head, 0 tails to the base uniform prior of 1 head, 1 tail).
    • Prior p0(π)=22πp_0(\pi) = 2-2\pi (Beta(1,2)): The form π0(1π)1\pi^0(1-\pi)^1 represents observing 1 virtual "tail" before the actual data. The MAP adds 1 to the denominator (total flips) but nothing to the numerator (heads). The Bayesian expectation reflects this by skewing the estimate downward.