Skip to main content

Answer

Prerequisites

  • Calculus: Product rule and differentiation of logarithms.
  • Optimization: The Method of Lagrange Multipliers for constrained optimization problems.
  • Probability Theory: The constraint that categorical probabilities must sum to 1.
  • Softmax Function: The mathematical operation that turns arbitrary sets of scalars into probabilities summing to one.

Step-by-Step Derivation

Step 1: Define the objective and constraint functions We want to maximize the new objective function: f(π)=_j=1Kπj(Njlogπj)f(\pi) = \sum\_{j=1}^K \pi_j (N_j - \log \pi_j)

subject to the equality constraint: j=1Kπj=1    g(π)=j=1Kπj1=0\sum*{j=1}^K \pi_j = 1 \implies g(\pi) = \sum*{j=1}^K \pi_j - 1 = 0

(Note: As with part (a), we temporarily ignore the inequality constraint πj0\pi_j \ge 0. We will find a stationary point and verify that the resulting solution mathematically produces non-negative values.)

Step 2: Form the Lagrangian Using the standard formulation L(π,λ)=f(π)λg(π)L(\pi, \lambda) = f(\pi) - \lambda g(\pi), we construct the Lagrangian function:

L(π,λ)=j=1Kπj(Njlogπj)λ(j=1Kπj1)L(\pi, \lambda) = \sum*{j=1}^K \pi_j(N_j - \log \pi_j) - \lambda \left(\sum*{j=1}^K \pi_j - 1\right)

Step 3: Find the stationary point with respect to πj\pi_j We compute the partial derivative of LL with respect to a specific πj\pi_j. Note that we must use the product rule for the πjlogπj-\pi_j \log \pi_j term:

Lπj=πj(πjNjπjlogπjλπj)\frac{\partial L}{\partial \pi_j} = \frac{\partial}{\partial \pi_j} \left( \pi_j N_j - \pi_j \log \pi_j - \lambda \pi_j \right) Lπj=Nj(1logπj+πj1πj)λ\frac{\partial L}{\partial \pi_j} = N_j - \left( 1 \cdot \log \pi_j + \pi_j \cdot \frac{1}{\pi_j} \right) - \lambda Lπj=Njlogπj1λ\frac{\partial L}{\partial \pi_j} = N_j - \log \pi_j - 1 - \lambda

Setting this derivative to zero: Njlogπj1λ=0N_j - \log \pi_j - 1 - \lambda = 0

Step 4: Express πj\pi_j in terms of λ\lambda Rearranging the equation to solve for πj\pi_j:

logπj=Nj1λ\log \pi_j = N_j - 1 - \lambda

Taking the exponent of both sides: πj=exp(Nj1λ)=exp(Nj)exp(1λ)\pi_j = \exp(N_j - 1 - \lambda) = \exp(N_j) \cdot \exp(-1-\lambda)

Step 5: Enforce the equality constraint to solve for λ\lambda We substitute our expression for πj\pi_j into the sum-to-one constraint:

j=1Kπj=j=1K[exp(Nj)exp(1λ)]=1\sum*{j=1}^K \pi_j = \sum*{j=1}^K \left[ \exp(N_j) \exp(-1-\lambda) \right] = 1

Since the term exp(1λ)\exp(-1-\lambda) does not depend on the index jj, we factor it out:

exp(1λ)_j=1Kexp(Nj)=1\exp(-1-\lambda) \sum\_{j=1}^K \exp(N_j) = 1

Solving for exp(1λ)\exp(-1-\lambda):

exp(1λ)=1_k=1Kexp(Nk)\exp(-1-\lambda) = \frac{1}{\sum\_{k=1}^K \exp(N_k)} *(We use kk for the index in the denominator to avoid confusion during substitution.)*

Step 6: Substitute back to find the final solution Substitute the result from Step 5 back into the expression for πj\pi_j derived in Step 4:

πj=exp(Nj)exp(1λ)\pi*j = \exp(N_j) \cdot \exp(-1-\lambda) πj=exp(Nj)1k=1Kexp(Nk)\pi_j = \exp(N_j) \cdot \frac{1}{\sum*{k=1}^K \exp(N*k)} πj=exp(Nj)k=1Kexp(Nk)\pi_j = \frac{\exp(N_j)}{\sum*{k=1}^K \exp(N_k)}

Verification: Given that the exponential function always produces strictly positive values (exp(x)>0\exp(x) > 0 for all real xx), we have πj>0\pi_j > 0. This inherently satisfies the non-negative strict bounds constraint. This final functional form is famously known as the Softmax function.