Prerequisites
- Expectation-Maximization (EM) Algorithm: An iterative method to find maximum likelihood estimates of parameters in statistical models, where the model depends on unobserved latent variables.
- Maximum Likelihood Estimation (MLE): A method of estimating the parameters of a statistical model given observations.
- Exponential Distribution: A continuous probability distribution parameterized by a rate parameter λ. Its Probability Density Function (PDF) is f(x;λ)=λe−λx.
- Lagrange Multipliers: A strategy for finding the local maxima and minima of a function subject to equation constraints.
Step-by-Step Derivation
To derive the EM algorithm for the mixture of exponentials, we first introduce a latent variable zi∈{1,…,K} for each sample xi, which indicates the component from which xi was generated. We represent zi as a one-hot vector where zij=1 if xi belongs to component j, and 0 otherwise.
The complete data log-likelihood for the dataset (X,Z)={(xi,zi)}i=1n is:
ℓc(θ)=∑i=1nlogp(xi,zi∣θ)=∑i=1n∑j=1KI(zi=j)log(πjλje−λjxi)
where I(zi=j) is an indicator function that equals 1 if zi=j and 0 otherwise.
1. E-Step (Expectation)
In the E-step, we compute the expected value of the latent variables zi given the data X and the current parameter estimates θ(t)={πj(t),λj(t)}j=1K.
We define the responsibility γij as the posterior probability that sample xi was generated by component j:
γij=E[I(zi=j)∣xi,θ(t)]=p(zi=j∣xi,θ(t))
Using Bayes' theorem:
γij=∑m=1Kp(zi=m∣θ(t))p(xi∣zi=m,θ(t))p(zi=j∣θ(t))p(xi∣zi=j,θ(t))
Substituting the given distributions:
γij=∑m=1Kπm(t)λm(t)e−λm(t)xiπj(t)λj(t)e−λj(t)xi
We construct the expected complete-data log-likelihood (the Q-function), replacing the indicator function with its expectation γij:
Q(θ,θ(t))=∑i=1n∑j=1Kγij(logπj+logλj−λjxi)
2. M-Step (Maximization)
In the M-step, we maximize the Q-function with respect to the parameters θ={πj,λj} to find the updated parameters θ(t+1). Let Nj=∑i=1nγij be the effective number of samples assigned to component j.
Updating πj (Mixture Weights):
We want to maximize Q with respect to πj subject to the constraint ∑j=1Kπj=1. We use a Lagrange multiplier α to enforce this constraint:
L(π,α)=∑i=1n∑j=1Kγijlogπj+α(1−∑j=1Kπj)
Taking the partial derivative with respect to πj and setting it to zero:
∂πj∂L=πj1∑i=1nγij−α=0⟹πj=αNj
Summing over all K components to solve for the Lagrange multiplier α:
∑j=1Kπj=∑j=1KαNj=α1∑j=1K∑i=1nγij=αn=1⟹α=n
Thus, the update rule for the mixture weights is:
πj(t+1)=nNj
Updating λj (Component Parameters):
We take the partial derivative of the Q-function with respect to λj and set it to zero:
∂λj∂Q=∑i=1nγij(λj1−xi)=0
λj1∑i=1nγij−∑i=1nγijxi=0
Replacing the sum of responsibilities with Nj:
λjNj=∑i=1nγijxi
Rearranging for λj, we get the update rule:
λj(t+1)=∑i=1nγijxiNj
Summary of the Derived EM Algorithm
Given initial parameters θ(0)={πj(0),λj(0)}j=1K, iterate the following steps until convergence:
-
E-step: Calculate the responsibilities for i=1,…,n and j=1,…,K:
γij=∑m=1Kπm(t)λm(t)e−λm(t)xiπj(t)λj(t)e−λj(t)xi
-
M-step: Update the parameters using the responsibilities:
Nj=∑i=1nγij
πj(t+1)=nNj
λj(t+1)=∑i=1nγijxiNj