1. Prior Identification as Beta Distributions
First, let's map these priors to the standard Beta distribution Beta(α,β)∝πα−1(1−π)β−1.
2. Calculate MAP Estimates
MAP maximizes the posterior.
Posterior ∝ Likelihood × Prior.
Likelihood is πs(1−π)n−s.
Case 1: Prior p1 (α=2,β=1)
- Posterior ∝πs(1−π)n−s⋅π1=πs+1(1−π)n−s.
- This is proportional to Beta(s+2,n−s+1).
- We maximize f(π)=πs+1(1−π)n−s.
- Log-posterior: (s+1)lnπ+(n−s)ln(1−π).
- Derivative = 0: πs+1−1−πn−s=0.
- (s+1)(1−π)=(n−s)π.
- s+1−sπ−π=nπ−sπ.
- s+1=(n+1)π.
- π^MAP,1=n+1s+1.
Case 2: Prior p0 (α=1,β=2)
- Posterior ∝πs(1−π)n−s⋅(1−π)1=πs(1−π)n−s+1.
- This is proportional to Beta(s+1,n−s+2).
- We maximize f(π)=πs(1−π)n−s+1.
- Derivative = 0: πs−1−πn−s+1=0.
- s(1−π)=(n−s+1)π.
- s−sπ=nπ−sπ+π.
- s=(n+1)π.
- π^MAP,0=n+1s.
3. Effective Estimates (Bayesian Mean)
The question also asks "What is the effective estimate...". This can be ambiguous (referring to MAP or Mean). The second part of the question ("virtual samples") applies to both, but standard Bayesian prediction uses the Mean.
Let's compute the Mean (Posterior Expectation) for completeness.
-
For p1 (Posterior Beta(s+2,n−s+1)):
- Mean = αpost+βpostαpost=(s+2)+(n−s+1)s+2=n+3s+2.
-
For p0 (Posterior Beta(s+1,n−s+2)):
- Mean = (s+1)+(n−s+2)s+1=n+3s+1.
(Note: The question likely asks for the MAP intuitive explanation since it asked to "Calculate the MAP estimates" specifically.)
4. Intuitive Explanation ("Virtual" Samples)
We interpret the hyperparameters α,β of the prior as virtual counts added to the actual data.
Prior∝πα−1(1−π)β−1.
MAP estimate formula for Beta posterior: n+(α+β−2)s+(α−1).
For p1 (α=2,β=1):
- Virtual Samples: We added 1 sample, which was a success.
- Total virtual N′=1. Total virtual S′=1.
- MAP Estimate: n+1s+1.
- Interpretation: The prior 2π acts like we've already observed one Head. This biases the result towards 1.
For p0 (α=1,β=2):
- Virtual Samples: We added 1 sample, which was a failure.
- Total virtual N′=1. Total virtual S′=0.
- MAP Estimate: n+1s.
- Interpretation: The prior 2(1−π) acts like we've already observed one Tail. This biases the result towards 0.