Pre-required Knowledge
- Poisson Distribution: A discrete probability distribution expressing the probability of a given number of events occurring in a fixed interval. PDF: P(k∣λ)=k!λke−λ.
- Mixture Model: A probabilistic model for representing the presence of subpopulations within an overall population.
- Expectation-Maximization (EM) Algorithm: An iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.
- E-step: Calculate the expected value of the latent variables given current parameters.
- M-step: Update parameters to maximize the likelihood given the expected latent variables.
Step-by-Step Answer
1. Log-Likelihood Function
Let latent variables Z={z1,…,zn}, where zi∈{1,…,K} indicates which component generated xi, or represented as a one-hot vector zi=[zi1,…,ziK]T.
The complete-data log-likelihood is:
lnp(X,Z∣θ)=i=1∑nj=1∑Kzijln(πjP(xi∣λj))
=i=1∑nj=1∑Kzij[lnπj+ln(xi!1e−λjλjxi)]
=i=1∑nj=1∑Kzij[lnπj−λj+xilnλj−ln(xi!)]
2. E-step (Expectation)
We calculate the expected value of the latent variables zij given the observed data X and current parameters θ(t). This quantity is commonly denoted as γij (responsibility):
γij(t)=E[zij∣xi,θ(t)]=P(zij=1∣xi,θ(t))
Using Bayes' theorem:
γij(t)=∑l=1Kπl(t)P(xi∣λl(t))πj(t)P(xi∣λj(t))
Where P(xi∣λ)=xi!e−λλxi.
3. M-step (Maximization)
We maximize the expectation of the complete-data log-likelihood with respect to θ. The Q-function is:
Q(θ,θ(t))=i=1∑nj=1∑Kγij(t)[lnπj−λj+xilnλj+const]
Update πj:
We need to maximize ∑i=1n∑j=1Kγij(t)lnπj subject to ∑πj=1. Using Lagrange multipliers, we obtain:
πj(t+1)=n1i=1∑nγij(t)=nNj
where Nj=∑i=1nγij(t) is the effective number of data points assigned to component j.
Update λj:
We take the derivative of the Q-function with respect to λj and set it to 0:
∂λj∂Q=i=1∑nγij(t)[−1+λjxi]=0
−i=1∑nγij(t)+λj1i=1∑nγij(t)xi=0
Since ∑i=1nγij(t)=Nj:
−Nj+λj1i=1∑nγij(t)xi=0
λj(t+1)=Nj∑i=1nγij(t)xi=∑i=1nγij(t)∑i=1nγij(t)xi
Relation to ML Estimate (Problem 2.1)
In Problem 2.1 (standard Poisson ML estimate), the Maximum Likelihood Estimate for λ is simply the sample mean:
λML=n∑i=1nxi
The M-step estimate for component j, λj(t+1), is a weighted mean of the samples.
- Instead of each sample contributing equally (weight 1/n), each sample xi is weighted by its responsibility γij(t) (the probability that sample i belongs to component j).
- The denominator Nj is the sum of these weights, normalizing the estimate.
Thus, the M-step performs a weighted Maximum Likelihood Estimation for each component separately, based on the current soft assignment of points to that component.