Skip to main content

Explain

Explanation of the EM Derivation for Mixture of Exponentials

The Expectation-Maximization (EM) algorithm is a standard tool for finding maximum likelihood estimates in models with latent (hidden) variables. In a mixture model, the "hidden" variable is the ID of the component that generated each data point. We don't know which exponential distribution generated which xix_i, so we have to estimate it probabilistically.

1. The E-Step: Guessing the Labels

The "Expectation" step is essentially asking: "Given our current parameter estimates (π,λ)(\pi, \lambda), how likely is it that data point xix_i came from component jj?"

This probability is called the responsibility, denoted γij\gamma_{ij}.

  • The numerator πjp(xiλj)\pi_j p(x_i|\lambda_j) is the joint probability of picking component jj and then observing xix_i.
  • The denominator is the total probability of observing xix_i across all possible components (law of total probability).
  • The result is a normalized probability (sums to 1 across jj for each ii).

2. The M-Step: Updating the Parameters

The "Maximization" step asks: "Given our soft guesses about the labels (γij\gamma_{ij}), what are the best parameters?"

We maximize the Q-function, which is the expected log-likelihood. This effectively separates the problem so we can treat each component somewhat independently, weighted by the responsibilities.

Updating πj\pi_j (Mixing Coefficients)

The update for πj\pi_j is intuitively the statistical average of the assignment probabilities.

πj=1ni=1nγij\pi_j = \frac{1}{n} \sum_{i=1}^n \gamma_{ij}

This means: "The probability of component jj is the average responsibility of component jj across all data points." If component 1 takes 30% of the responsibility for every point, then π1\pi_1 should be 0.3.

Updating λj\lambda_j (Rate Parameters)

For a standard exponential distribution, the MLE for λ\lambda is 1/mean(x)1 / \text{mean}(x).

λMLE=nxi\lambda_{MLE} = \frac{n}{\sum x_i}

In the mixture case, we have a weighted version of this.

  • The numerator i=1nγij\sum_{i=1}^n \gamma_{ij} is the "effective number of points" assigned to component jj (often denoted NjN_j).
  • The denominator i=1nγijxi\sum_{i=1}^n \gamma_{ij} x_i is the "weighted sum of values" assigned to component jj.

So the update is effectively:

λj=1weighted mean of x for component j=Nji=1nγijxi\lambda_j = \frac{1}{\text{weighted mean of } x \text{ for component } j} = \frac{N_j}{\sum_{i=1}^n \gamma_{ij} x_i}

This matches the intuition from the single exponential case, but weighted by how much each point belongs to that component.