Skip to main content

Answer

Pre-required Knowledge

  • Poisson Distribution: A discrete probability distribution expressing the probability of a given number of events occurring in a fixed interval. PDF: P(kλ)=λkeλk!P(k|\lambda) = \frac{\lambda^k e^{-\lambda}}{k!}.
  • Mixture Model: A probabilistic model for representing the presence of subpopulations within an overall population.
  • Expectation-Maximization (EM) Algorithm: An iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.
    • E-step: Calculate the expected value of the latent variables given current parameters.
    • M-step: Update parameters to maximize the likelihood given the expected latent variables.

Step-by-Step Answer

1. Log-Likelihood Function

Let latent variables Z={z1,,zn}Z = \{z_1, \dots, z_n\}, where zi{1,,K}z_i \in \{1, \dots, K\} indicates which component generated xix_i, or represented as a one-hot vector zi=[zi1,,ziK]Tz_i = [z_{i1}, \dots, z_{iK}]^T.

The complete-data log-likelihood is:

lnp(X,Zθ)=i=1nj=1Kzijln(πjP(xiλj))\ln p(X, Z|\theta) = \sum_{i=1}^n \sum_{j=1}^K z_{ij} \ln (\pi_j P(x_i|\lambda_j)) =i=1nj=1Kzij[lnπj+ln(1xi!eλjλjxi)]= \sum_{i=1}^n \sum_{j=1}^K z_{ij} [\ln \pi_j + \ln (\frac{1}{x_i!} e^{-\lambda_j} \lambda_j^{x_i})] =i=1nj=1Kzij[lnπjλj+xilnλjln(xi!)]= \sum_{i=1}^n \sum_{j=1}^K z_{ij} [\ln \pi_j - \lambda_j + x_i \ln \lambda_j - \ln(x_i!)]

2. E-step (Expectation)

We calculate the expected value of the latent variables zijz_{ij} given the observed data XX and current parameters θ(t)\theta^{(t)}. This quantity is commonly denoted as γij\gamma_{ij} (responsibility):

γij(t)=E[zijxi,θ(t)]=P(zij=1xi,θ(t))\gamma_{ij}^{(t)} = E[z_{ij}|x_i, \theta^{(t)}] = P(z_{ij}=1|x_i, \theta^{(t)})

Using Bayes' theorem:

γij(t)=πj(t)P(xiλj(t))l=1Kπl(t)P(xiλl(t))\gamma_{ij}^{(t)} = \frac{\pi_j^{(t)} P(x_i|\lambda_j^{(t)})}{\sum_{l=1}^K \pi_l^{(t)} P(x_i|\lambda_l^{(t)})}

Where P(xiλ)=eλλxixi!P(x_i|\lambda) = \frac{e^{-\lambda}\lambda^{x_i}}{x_i!}.

3. M-step (Maximization)

We maximize the expectation of the complete-data log-likelihood with respect to θ\theta. The Q-function is:

Q(θ,θ(t))=i=1nj=1Kγij(t)[lnπjλj+xilnλj+const]Q(\theta, \theta^{(t)}) = \sum_{i=1}^n \sum_{j=1}^K \gamma_{ij}^{(t)} [\ln \pi_j - \lambda_j + x_i \ln \lambda_j + \text{const}]

Update πj\pi_j: We need to maximize i=1nj=1Kγij(t)lnπj\sum_{i=1}^n \sum_{j=1}^K \gamma_{ij}^{(t)} \ln \pi_j subject to πj=1\sum \pi_j = 1. Using Lagrange multipliers, we obtain:

πj(t+1)=1ni=1nγij(t)=Njn\pi_j^{(t+1)} = \frac{1}{n} \sum_{i=1}^n \gamma_{ij}^{(t)} = \frac{N_j}{n}

where Nj=i=1nγij(t)N_j = \sum_{i=1}^n \gamma_{ij}^{(t)} is the effective number of data points assigned to component jj.

Update λj\lambda_j: We take the derivative of the Q-function with respect to λj\lambda_j and set it to 0:

Qλj=i=1nγij(t)[1+xiλj]=0\frac{\partial Q}{\partial \lambda_j} = \sum_{i=1}^n \gamma_{ij}^{(t)} [-1 + \frac{x_i}{\lambda_j}] = 0 i=1nγij(t)+1λji=1nγij(t)xi=0-\sum_{i=1}^n \gamma_{ij}^{(t)} + \frac{1}{\lambda_j} \sum_{i=1}^n \gamma_{ij}^{(t)} x_i = 0

Since i=1nγij(t)=Nj\sum_{i=1}^n \gamma_{ij}^{(t)} = N_j:

Nj+1λji=1nγij(t)xi=0-N_j + \frac{1}{\lambda_j} \sum_{i=1}^n \gamma_{ij}^{(t)} x_i = 0 λj(t+1)=i=1nγij(t)xiNj=i=1nγij(t)xii=1nγij(t)\lambda_j^{(t+1)} = \frac{\sum_{i=1}^n \gamma_{ij}^{(t)} x_i}{N_j} = \frac{\sum_{i=1}^n \gamma_{ij}^{(t)} x_i}{\sum_{i=1}^n \gamma_{ij}^{(t)}}

Relation to ML Estimate (Problem 2.1)

In Problem 2.1 (standard Poisson ML estimate), the Maximum Likelihood Estimate for λ\lambda is simply the sample mean:

λML=i=1nxin\lambda_{ML} = \frac{\sum_{i=1}^n x_i}{n}

The M-step estimate for component jj, λj(t+1)\lambda_j^{(t+1)}, is a weighted mean of the samples.

  • Instead of each sample contributing equally (weight 1/n1/n), each sample xix_i is weighted by its responsibility γij(t)\gamma_{ij}^{(t)} (the probability that sample ii belongs to component jj).
  • The denominator NjN_j is the sum of these weights, normalizing the estimate.

Thus, the M-step performs a weighted Maximum Likelihood Estimation for each component separately, based on the current soft assignment of points to that component.