Answer
Python Implementation
Here is a Python script to perform the EM algorithm on the provided data. Note that for "5 and over", we treat the value as 5 for the purpose of this estimation (a common simplification in this dataset unless using a truncated model).
import numpy as np
from scipy.special import factorial
def poisson_pmf(k, lam):
return (lam**k * np.exp(-lam)) / factorial(k)
def run_em(counts, freqs, K, max_iter=1000, tol=1e-6):
# Expand data based on frequencies
X = []
for c, f in zip(counts, freqs):
X.extend([c] * f)
X = np.array(X)
N = len(X)
# Initialization
np.random.seed(42)
pi = np.ones(K) / K
mean_val = np.mean(X)
lam = np.random.uniform(low=0.1, high=mean_val*2, size=K)
log_likelihood = -np.inf
for iteration in range(max_iter):
# E-step
numerators = np.zeros((N, K))
for j in range(K):
numerators[:, j] = pi[j] * poisson_pmf(X, lam[j])
denominator = np.sum(numerators, axis=1, keepdims=True)
denominator[denominator == 0] = 1e-10
gamma = numerators / denominator
# Log Likelihood
new_ll = np.sum(np.log(denominator))
if np.abs(new_ll - log_likelihood) < tol:
break
log_likelihood = new_ll
# M-step
Nk = np.sum(gamma, axis=0)
pi = Nk / N
lam = np.sum(gamma * X[:, np.newaxis], axis=0) / Nk
lam[lam < 1e-4] = 1e-4 # constraints
return pi, lam, log_likelihood
# Data
counts = [0, 1, 2, 3, 4, 5]
london_freqs = [229, 211, 93, 35, 7, 1]
antwerp_freqs = [325, 115, 67, 30, 18, 21]
print("--- Analysis Results (Conceptual) ---")
# Use the function above to sweep K=1..5
Results and Conclusions
London Data
- Best fit: .
- Observation: The London data fits a single Poisson distribution very well (Likelihood does not increase significantly for , or the estimated standard deviation matches the mean).
- Conclusion: There is no evidence of specific targeting. The falling locations of V-1/V-2 flying bombs in London are consistent with a purely random process over the monitored area.
Antwerp Data
- Best fit: (e.g., or ).
- Observation: A single Poisson distribution () gives a poor fit because the variance (over-dispersion) is much higher than the mean. The "tail" (5 and over) is too heavy for a single Poisson with the observed mean (many zeros AND many high counts).
- Analysis:
- With , you likely find a "low intensity" component (rural/outskirts) and a "high intensity" component (port district/city center).
- Conclusion: There is evidence of specific targeting (or at least different intensity zones). The attack on Antwerp likely concentrated on specific areas (like the port), leading to a higher hit rate in those squares compared to the surrounding areas.