Explain
Explanation
In part (a), the objective function led to a solution where was linearly proportional to . This is characteristic of the standard MLE for multinomial distributions.
In part (b), the objective function is slightly different: . This looks like an entropy term () combined with a linear term ().
When we maximize this function subject to the sum constraint :
- Likelihood vs Entropy: The term is the entropy. Maximizing entropy usually tends towards a uniform distribution. The term weights the probabilities by .
- Exponential Relationship:
- The derivative of is .
- The derivative of is .
- Because the objective function has the form, the derivative has a term (without the scaling seen in part (a)).
- To solve (where C is some constant derived from other terms), we must use the exponential function: .
- Softmax Function: The resulting form is famously known as the Softmax function. It takes a vector of real numbers and turns them into a probability distribution proportional to the exponentials of the input numbers. This is widely used in neural networks and machine learning to convert logits into probabilities.