Skip to main content

Answer

Prerequisites

  • Maximum Likelihood Estimation (MLE)
  • Maximum A Posteriori Estimation (MAP)
  • Beta Distribution Mode

Step-by-Step Derivation

  1. Maximum Likelihood Estimate (MLE): The MLE π^MLE\hat{\pi}_{MLE} maximizes the likelihood function p(Dπ)=πs(1π)nsp(\mathcal{D}|\pi) = \pi^s(1-\pi)^{n-s}. To find the maximum, we take the log-likelihood: L(π)=logp(Dπ)=slogπ+(ns)log(1π)\mathcal{L}(\pi) = \log p(\mathcal{D}|\pi) = s \log\pi + (n-s)\log(1-\pi) Take the derivative with respect to π\pi and set it to 0: Lπ=sπns1π=0\frac{\partial \mathcal{L}}{\partial \pi} = \frac{s}{\pi} - \frac{n-s}{1-\pi} = 0 s(1π)=(ns)π    ssπ=nπsπ    nπ=ss(1-\pi) = (n-s)\pi \implies s - s\pi = n\pi - s\pi \implies n\pi = s π^MLE=sn\hat{\pi}_{MLE} = \frac{s}{n}

  2. Maximum A Posteriori (MAP) Estimate with Uniform Prior: The MAP estimate π^MAP\hat{\pi}_{MAP} maximizes the posterior distribution p(πD)p(\pi|\mathcal{D}). Let's recall the posterior from part (b): p(πD)πs(1π)nsp(\pi|\mathcal{D}) \propto \pi^s (1-\pi)^{n-s} Since the uniform prior p(π)=1p(\pi) = 1 is a constant, the posterior is exactly proportional to the likelihood. Therefore, maximizing the posterior is mathematically identical to maximizing the likelihood. π^MAP=argmaxπp(πD)=argmaxπ[p(Dπ)1]=argmaxπp(Dπ)\hat{\pi}_{MAP} = \operatorname{argmax}_\pi p(\pi|\mathcal{D}) = \operatorname{argmax}_\pi \left[ p(\mathcal{D}|\pi) \cdot 1 \right] = \operatorname{argmax}_\pi p(\mathcal{D}|\pi) Thus, using a uniform prior: π^MAP=π^MLE=sn\hat{\pi}_{MAP} = \hat{\pi}_{MLE} = \frac{s}{n}

  3. Comparing the Estimates:

    • MLE vs. MAP (Uniform): They are identical (sn\frac{s}{n}). There is no mathematical advantage of one over the other in this specific case because the uniform prior adds no extra information (it's uninformative). The uniform prior represents complete uncertainty or lack of bias, letting the data completely dictate the peak (mode) of the belief.
    • Bayesian Predictive Estimate: From part (c), π^Bayes=s+1n+2\hat{\pi}_{Bayes} = \frac{s+1}{n+2}. The advantage of the Bayesian predictive estimate (which is the mean of the posterior, not the mode like MAP) over MLE/MAP is robustness against extreme values, especially for small sample sizes nn. If n=1,s=1n=1, s=1, MLE/MAP gives 1.01.0 (absolute certainty of heads forever), while Bayesian yields 2/32/3 (more cautious).