Prerequisites
- Calculus: Product rule and differentiation of logarithms.
- Optimization: The Method of Lagrange Multipliers for constrained optimization problems.
- Probability Theory: The constraint that categorical probabilities must sum to 1.
- Softmax Function: The mathematical operation that turns arbitrary sets of scalars into probabilities summing to one.
Step-by-Step Derivation
Step 1: Define the objective and constraint functions
We want to maximize the new objective function:
f(π)=∑_j=1Kπj(Nj−logπj)
subject to the equality constraint:
∑∗j=1Kπj=1⟹g(π)=∑∗j=1Kπj−1=0
(Note: As with part (a), we temporarily ignore the inequality constraint πj≥0. We will find a stationary point and verify that the resulting solution mathematically produces non-negative values.)
Step 2: Form the Lagrangian
Using the standard formulation L(π,λ)=f(π)−λg(π), we construct the Lagrangian function:
L(π,λ)=∑∗j=1Kπj(Nj−logπj)−λ(∑∗j=1Kπj−1)
Step 3: Find the stationary point with respect to πj
We compute the partial derivative of L with respect to a specific πj. Note that we must use the product rule for the −πjlogπj term:
∂πj∂L=∂πj∂(πjNj−πjlogπj−λπj)
∂πj∂L=Nj−(1⋅logπj+πj⋅πj1)−λ
∂πj∂L=Nj−logπj−1−λ
Setting this derivative to zero:
Nj−logπj−1−λ=0
Step 4: Express πj in terms of λ
Rearranging the equation to solve for πj:
logπj=Nj−1−λ
Taking the exponent of both sides:
πj=exp(Nj−1−λ)=exp(Nj)⋅exp(−1−λ)
Step 5: Enforce the equality constraint to solve for λ
We substitute our expression for πj into the sum-to-one constraint:
∑∗j=1Kπj=∑∗j=1K[exp(Nj)exp(−1−λ)]=1
Since the term exp(−1−λ) does not depend on the index j, we factor it out:
exp(−1−λ)∑_j=1Kexp(Nj)=1
Solving for exp(−1−λ):
exp(−1−λ)=∑_k=1Kexp(Nk)1
*(We use k for the index in the denominator to avoid confusion during substitution.)*
Step 6: Substitute back to find the final solution
Substitute the result from Step 5 back into the expression for πj derived in Step 4:
π∗j=exp(Nj)⋅exp(−1−λ)
πj=exp(Nj)⋅∑∗k=1Kexp(N∗k)1
πj=∑∗k=1Kexp(Nk)exp(Nj)
Verification: Given that the exponential function always produces strictly positive values (exp(x)>0 for all real x), we have πj>0. This inherently satisfies the non-negative strict bounds constraint. This final functional form is famously known as the Softmax function.