Skip to main content

Explain

Explanation

In part (a), the objective function Njlogπj\sum N_j \log \pi_j led to a solution where πj\pi_j was linearly proportional to NjN_j. This is characteristic of the standard MLE for multinomial distributions.

In part (b), the objective function is slightly different: πj(Njlogπj)\sum \pi_j (N_j - \log \pi_j). This looks like an entropy term (πlogπ\pi \log \pi) combined with a linear term (πN\pi N).

When we maximize this function subject to the sum constraint πj=1\sum \pi_j = 1:

  1. Likelihood vs Entropy: The term πjlogπj-\pi_j \log \pi_j is the entropy. Maximizing entropy usually tends towards a uniform distribution. The term πjNj\pi_j N_j weights the probabilities by NjN_j.
  2. Exponential Relationship:
    • The derivative of logx\log x is 1/x1/x.
    • The derivative of xlogxx \log x is 1+logx1 + \log x.
    • Because the objective function has the πlogπ\pi \log \pi form, the derivative has a logπ\log \pi term (without the 1/π1/\pi scaling seen in part (a)).
    • To solve logπ=C\log \pi = C (where C is some constant derived from other terms), we must use the exponential function: π=eC\pi = e^C.
  3. Softmax Function: The resulting form πj=exp(Nj)exp(Nk)\pi_j = \frac{\exp(N_j)}{\sum \exp(N_k)} is famously known as the Softmax function. It takes a vector of real numbers NN and turns them into a probability distribution proportional to the exponentials of the input numbers. This is widely used in neural networks and machine learning to convert logits into probabilities.