Skip to main content

Answer

Prerequisites

  • Mixture Models: Understanding how a probability density can be represented as a weighted sum of component densities.
  • Exponential Distribution: The PDF p(xλ)=λeλxp(x|\lambda) = \lambda e^{-\lambda x}.
  • Maximum Likelihood Estimation (MLE): Finding parameter values that maximize the likelihood function.
  • Expectation-Maximization (EM) Algorithm: An iterative method to find MLE or MAP estimates of parameters in statistical models, especially those with latent variables.
  • Jensen's Inequality: Used in the derivation of the lower bound for the EM algorithm.
  • Lagrange Multipliers: Generating constraints optimization (specifically for πj=1\sum \pi_j = 1).

Step-by-Step Derivation

1. Model Setup and Complete Data Likelihood

Let the observed data be X={x1,,xn}X = \{x_1, \dots, x_n\}. We introduce latent variables Z={z1,,zn}Z = \{z_1, \dots, z_n\}, where zi{1,,K}z_i \in \{1, \dots, K\} indicates which component generated xix_i. Using a 1-of-K coding scheme, let zij=1z_{ij} = 1 if xix_i belongs to component jj, and 0 otherwise.

The complete data likelihood is:

P(X,Zθ)=i=1nj=1K[πjp(xiλj)]zijP(X, Z | \theta) = \prod_{i=1}^n \prod_{j=1}^K [\pi_j p(x_i | \lambda_j)]^{z_{ij}}

The complete data log-likelihood is:

lnP(X,Zθ)=i=1nj=1Kzij(lnπj+ln(λjeλjxi))\ln P(X, Z | \theta) = \sum_{i=1}^n \sum_{j=1}^K z_{ij} \left( \ln \pi_j + \ln(\lambda_j e^{-\lambda_j x_i}) \right) lnP(X,Zθ)=i=1nj=1Kzij(lnπj+lnλjλjxi)\ln P(X, Z | \theta) = \sum_{i=1}^n \sum_{j=1}^K z_{ij} \left( \ln \pi_j + \ln \lambda_j - \lambda_j x_i \right)

2. E-step (Expectation)

We calculate the expected value of the latent variables zijz_{ij} given the current parameters θ(t)\theta^{(t)}. This is the posterior probability (responsibility) that data point xix_i was generated by component jj:

γij(t)=E[zijxi,θ(t)]=P(zi=jxi,θ(t))\gamma_{ij}^{(t)} = E[z_{ij} | x_i, \theta^{(t)}] = P(z_i = j | x_i, \theta^{(t)})

Using Bayes' theorem:

γij(t)=πj(t)p(xiλj(t))k=1Kπk(t)p(xiλk(t))\gamma_{ij}^{(t)} = \frac{\pi_j^{(t)} p(x_i | \lambda_j^{(t)})}{\sum_{k=1}^K \pi_k^{(t)} p(x_i | \lambda_k^{(t)})}

Substituting the exponential density:

γij(t)=πj(t)λj(t)eλj(t)xik=1Kπk(t)λk(t)eλk(t)xi\gamma_{ij}^{(t)} = \frac{\pi_j^{(t)} \lambda_j^{(t)} e^{-\lambda_j^{(t)} x_i}}{\sum_{k=1}^K \pi_k^{(t)} \lambda_k^{(t)} e^{-\lambda_k^{(t)} x_i}}

3. M-step (Maximization)

We maximize the expected complete data log-likelihood (Q-function) with respect to the parameters θ={πj,λj}\theta = \{\pi_j, \lambda_j\}. The Q-function is:

Q(θ,θ(t))=EZ[lnP(X,Zθ)X,θ(t)]=i=1nj=1Kγij(t)(lnπj+lnλjλjxi)Q(\theta, \theta^{(t)}) = E_Z [\ln P(X, Z | \theta) | X, \theta^{(t)}] = \sum_{i=1}^n \sum_{j=1}^K \gamma_{ij}^{(t)} \left( \ln \pi_j + \ln \lambda_j - \lambda_j x_i \right)

Optimization w.r.t πj\pi_j

We need to maximize i=1nj=1Kγij(t)lnπj\sum_{i=1}^n \sum_{j=1}^K \gamma_{ij}^{(t)} \ln \pi_j subject to the constraint j=1Kπj=1\sum_{j=1}^K \pi_j = 1. Using Lagrange multipliers, the Lagrangian is:

L(π,α)=i=1nj=1Kγij(t)lnπj+α(j=1Kπj1)L(\pi, \alpha) = \sum_{i=1}^n \sum_{j=1}^K \gamma_{ij}^{(t)} \ln \pi_j + \alpha \left( \sum_{j=1}^K \pi_j - 1 \right)

Taking the derivative with respect to πj\pi_j and setting to 0:

Lπj=i=1nγij(t)πj+α=0    πj=i=1nγij(t)α\frac{\partial L}{\partial \pi_j} = \sum_{i=1}^n \frac{\gamma_{ij}^{(t)}}{\pi_j} + \alpha = 0 \implies \pi_j = -\frac{\sum_{i=1}^n \gamma_{ij}^{(t)}}{\alpha}

Summing over jj:

j=1Kπj=i=1nj=1Kγij(t)α    1=nα    α=n\sum_{j=1}^K \pi_j = -\frac{\sum_{i=1}^n \sum_{j=1}^K \gamma_{ij}^{(t)}}{\alpha} \implies 1 = -\frac{n}{\alpha} \implies \alpha = -n

Thus:

πj(t+1)=1ni=1nγij(t)\pi_j^{(t+1)} = \frac{1}{n} \sum_{i=1}^n \gamma_{ij}^{(t)}

Optimization w.r.t λj\lambda_j

We consider the terms in QQ involving λj\lambda_j:

Qλj=i=1nγij(t)(lnλjλjxi)Q_{\lambda_j} = \sum_{i=1}^n \gamma_{ij}^{(t)} (\ln \lambda_j - \lambda_j x_i)

Taking the derivative with respect to λj\lambda_j and setting to 0:

Qλj=i=1nγij(t)(1λjxi)=0\frac{\partial Q}{\partial \lambda_j} = \sum_{i=1}^n \gamma_{ij}^{(t)} \left( \frac{1}{\lambda_j} - x_i \right) = 0 1λji=1nγij(t)=i=1nγij(t)xi\frac{1}{\lambda_j} \sum_{i=1}^n \gamma_{ij}^{(t)} = \sum_{i=1}^n \gamma_{ij}^{(t)} x_i λj(t+1)=i=1nγij(t)i=1nγij(t)xi\lambda_j^{(t+1)} = \frac{\sum_{i=1}^n \gamma_{ij}^{(t)}}{\sum_{i=1}^n \gamma_{ij}^{(t)} x_i}

Final EM Algorithm

  1. Initialize: Choose initial values for πj(0)\pi_j^{(0)} and λj(0)\lambda_j^{(0)}.
  2. E-step: Calculate responsibilities: γij(t)=πj(t)λj(t)eλj(t)xik=1Kπk(t)λk(t)eλk(t)xi\gamma_{ij}^{(t)} = \frac{\pi_j^{(t)} \lambda_j^{(t)} e^{-\lambda_j^{(t)} x_i}}{\sum_{k=1}^K \pi_k^{(t)} \lambda_k^{(t)} e^{-\lambda_k^{(t)} x_i}}
  3. M-step: Update parameters: πj(t+1)=1ni=1nγij(t)\pi_j^{(t+1)} = \frac{1}{n} \sum_{i=1}^n \gamma_{ij}^{(t)} λj(t+1)=i=1nγij(t)i=1nγij(t)xi\lambda_j^{(t+1)} = \frac{\sum_{i=1}^n \gamma_{ij}^{(t)}}{\sum_{i=1}^n \gamma_{ij}^{(t)} x_i}
  4. Iterate: Repeat steps 2 and 3 until convergence.