Skip to main content

Question

Problem 4.12 Lagrange multipliers and equality constraints

In the M-step of the EM algorithm for mixture models, we need to calculate an estimate of the component prior probabilities πj\pi_j via the optimization problem,

{π^j}=argmax{πj}j=1KNjlogπj,s.t.j=1Kπj=1,πj0,\{\hat{\pi}_j\} = \underset{\{\pi_j\}}{\text{argmax}} \sum_{j=1}^K N_j \log \pi_j, \quad \text{s.t.} \quad \sum_{j=1}^K \pi_j = 1, \quad \pi_j \ge 0,

(4.47)

for some Nj0N_j \ge 0. Note that this optimization problem has an equality constraint, which is {πj}\{\pi_j\} must sum to 1, since they represent a probability distribution.

One method of solving an optimization problem with equality constraints is to use Lagrange multipliers. Consider the following problem,

x=argmaxxf(x),x^* = \underset{x}{\text{argmax}} \, f(x), s.t.g(x)=0,\text{s.t.} \quad g(x) = 0,

(4.48)

where f(x)f(x) is the objective function and g(x)g(x) is the constraint function. Let's look at two properties of these functions,

  • First, the gradient g(x)\nabla g(x) is orthogonal to the constraint surface, since g(x)g(x) should be constant along the direction of the constraint surface (otherwise it would not be 0).
  • Second, at the optimum, the gradient f(x)\nabla f(x) must also be orthogonal to the constraint surface. Otherwise, we could move along the constraint surface to increase f(x)f(x).

This is illustrated in the following figure:

_(Figure description: A contour plot showing concentric circles representing the level curves of an objective function f(x)=1x12x22f(x) = 1 - x_1^2 - x_2^2. A straight red line represents the constraint function g(x)=x1+x21=0g(x) = x_1 + x_2 - 1 = 0. The optimal point (x1,x2)=(0.5,0.5)(x_1^*, x_2^*) = (0.5, 0.5) is where the level curve of f(x)f(x) represents the highest reachable value while being tangent to the constraint line g(x)g(x). An arrow pointing towards the center represents the gradient f(x)\nabla f(x), which is orthogonal to the constraint line at the optimal point.)_

Hence, f(x)\nabla f(x) and g(x)\nabla g(x) must be parallel or anti-parallel, and by extension,

f(x)+λg(x)=0,\nabla f(x) + \lambda \nabla g(x) = 0,

(4.49)

for some λ0\lambda \neq 0. Define the Lagrangian function,

L(x,λ)=f(x)+λg(x).L(x, \lambda) = f(x) + \lambda g(x).

(4.50)

The optimality condition in (4.49) is obtained by setting Lx=0\frac{\partial L}{\partial x} = 0. Furthermore, setting Lλ=0\frac{\partial L}{\partial \lambda} = 0 yields the equality constraint, g(x)=0g(x) = 0. Hence, to solve the constrained optimization problem (4.48), we form the Lagrangian function, and find the stationary point w.r.t. both xx and λ\lambda, by simultaneously solving

xL(x,λ)=0,λL(x,λ)=0.\frac{\partial}{\partial x} L(x, \lambda) = 0, \quad \frac{\partial}{\partial \lambda} L(x, \lambda) = 0.

(4.51)

(a) Use Lagrange multipliers to optimize (4.47), and show that the solution is

πj=Njk=1KNk.\pi_j = \frac{N_j}{\sum_{k=1}^K N_k}.

(4.52)