- Expectation-Maximization (EM) Algorithm: An iterative method to find maximum likelihood estimates of parameters in statistical models, particularly when the model depends on unobserved latent variables.
- Poisson Distribution: A discrete probability distribution expressing the probability of a given number of events occurring in a fixed interval.
- Complete-Data Log-Likelihood: The log-likelihood function constructed by aggregating both observed variables and unobserved latent variables.
Step-by-Step Derivation
Let X={x1,…,xn} be our set of observations, where each xi∈{0,1,2,…}.
Let Z={z1,…,zn} be the unobserved latent variables (cluster assignments), where zi∈{1,…,K} indicates which Poisson component generated xi.
The probabilities of the latent variables are defined by the mixing weights: p(zi=j)=πj, with ∑j=1Kπj=1.
The probability of an observation xi given it comes from component j is:
p(xi∣zi=j,θ)=xi!e−λjλjxi
The complete data log-likelihood for one observation (xi,zi) is:
logp(xi,zi=j∣θ)=log(p(zi=j)p(xi∣zi=j,θ))=logπj−λj+xilogλj−log(xi!)
Hence, the total complete-data log-likelihood for all n samples, typically written using indicator functions I(zi=j), is:
Lc(θ)=∑i=1n∑j=1KI(zi=j)(logπj−λj+xilogλj−log(xi!))
1. The E-step (Expectation)
In the E-step, we compute the expected value of the complete data log-likelihood under the posterior distribution of latent variables Z, given the current parameter estimates θ(t).
This reduces to taking the expectation of the indicator variables I(zi=j), which yields the posterior responsibilities γij:
γij=p(zi=j∣xi,θ(t))=∑m=1Kp(zi=m∣θ(t))p(xi∣zi=m,θ(t))p(zi=j∣θ(t))p(xi∣zi=j,θ(t))
Substituting the Poisson density:
γij(t)=∑m=1Kπm(t)xi!e−λm(t)(λm(t))xiπj(t)xi!e−λj(t)(λj(t))xi=∑m=1Kπm(t)e−λm(t)(λm(t))xiπj(t)e−λj(t)(λj(t))xi
We arrive at the auxiliary function Q(θ,θ(t)) to be maximized in the next step:
Q(θ,θ(t))=∑i=1n∑j=1Kγij(t)(logπj−λj+xilogλj−log(xi!))
2. The M-step (Maximization)
In the M-step, we maximize Q(θ,θ(t)) with respect to the parameters θ=(π,λ).
Maximizing for πj (Mixing proportions):
We maximize using a Lagrange multiplier to enforce the constraint ∑j=1Kπj=1:
L(π,α)=∑i=1n∑j=1Kγij(t)logπj+α(1−∑j=1Kπj)
Taking the derivative w.r.t πj and equating to zero:
∂πj∂L(π,α)=πj∑i=1nγij(t)−α=0⟹πj=α1∑i=1nγij(t)
Summing over all j, we find α=n. Let Nj=∑i=1nγij(t) be the expected number of samples assigned to component j. The update rule is:
πj(t+1)=nNj
Maximizing for λj (Poisson rates):
We separate the terms containing λj in the Q function and take the derivative w.r.t λj:
∂λj∂Q=∑i=1nγij(t)(−1+λjxi)=0
Rearranging the expression to solve for λj:
∑i=1nγij(t)=∑i=1nγij(t)λjxi
Nj=λj1∑i=1nγij(t)xi⟹λj(t+1)=Nj∑i=1nγij(t)xi
Relation to MLE for a single Poisson
For a standard ML estimate of a single Poisson distribution (Problem 2.1), the update rule is λ=n1∑i=1nxi (the arithmetic mean of the data).
In the M-step here, the update rule λj(t+1)=∑i=1nγij(t)∑i=1nγij(t)xi is essentially a weighted ML estimate. Instead of treating every point equally (weight 1), each sample observation xi contributes to component j only fractionally based on its responsibility γij. The sum of weights Nj substitutes the general total component n.