Skip to main content

Answer

Prerequisites

  • Linear Discriminant Functions: From part (a), we know gk(x)=wkTx+bkg_k(x) = w_k^T x + b_k for any class kk.
  • Decision Boundary: The boundary between two classes ii and jj is the set of points xx where the classifier is indifferent between the two classes, meaning their discriminant functions are equal: gi(x)=gj(x)g_i(x) = g_j(x).
  • Hyperplane Equation: A hyperplane in dd-dimensional space can be defined by the equation wTx+b=0w^T x + b = 0, where ww is the normal vector to the hyperplane and bb is the bias (or offset).

Step-by-Step Derivation

  1. Set Discriminant Functions Equal: The decision boundary is defined by the condition where the scores for class ii and class jj are identical: gi(x)=gj(x)g_i(x) = g_j(x)

  2. Substitute the Linear Forms: Using the results from part (a), substitute the linear equations for gi(x)g_i(x) and gj(x)g_j(x): wiTx+bi=wjTx+bjw_i^T x + b_i = w_j^T x + b_j

  3. Rearrange into Hyperplane Form: Move all terms involving xx to one side and the constant terms to the other side to match the standard hyperplane equation wTx+b=0w^T x + b = 0: (wiwj)Tx+(bibj)=0(w_i - w_j)^T x + (b_i - b_j) = 0 Let w=wiwjw = w_i - w_j and b=bibjb = b_i - b_j.

  4. Derive the Expression for ww: Substitute the definitions of wiw_i and wjw_j from part (a): w=Σ1μiΣ1μjw = \Sigma^{-1}\mu_i - \Sigma^{-1}\mu_j Factor out Σ1\Sigma^{-1}: w=Σ1(μiμj)w = \Sigma^{-1}(\mu_i - \mu_j) This matches the required expression for ww.

  5. Derive the Expression for bb: Substitute the definitions of bib_i and bjb_j from part (a): b=(12μiTΣ1μi+logπi)(12μjTΣ1μj+logπj)b = \left( -\frac{1}{2}\mu_i^T\Sigma^{-1}\mu_i + \log \pi_i \right) - \left( -\frac{1}{2}\mu_j^T\Sigma^{-1}\mu_j + \log \pi_j \right) Group the quadratic terms and the logarithmic terms: b=12(μiTΣ1μiμjTΣ1μj)+(logπilogπj)b = -\frac{1}{2}(\mu_i^T\Sigma^{-1}\mu_i - \mu_j^T\Sigma^{-1}\mu_j) + (\log \pi_i - \log \pi_j)

  6. Simplify the Logarithmic Term: Using the quotient rule for logarithms (logAlogB=logAB\log A - \log B = \log \frac{A}{B}): logπilogπj=logπiπj\log \pi_i - \log \pi_j = \log \frac{\pi_i}{\pi_j}

  7. Simplify the Quadratic Term: We need to show that μiTΣ1μiμjTΣ1μj=(μi+μj)TΣ1(μiμj)\mu_i^T\Sigma^{-1}\mu_i - \mu_j^T\Sigma^{-1}\mu_j = (\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j). Let's expand the right side of this proposed equality: (μi+μj)TΣ1(μiμj)=(μiT+μjT)(Σ1μiΣ1μj)(\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j) = (\mu_i^T + \mu_j^T) (\Sigma^{-1}\mu_i - \Sigma^{-1}\mu_j) =μiTΣ1μiμiTΣ1μj+μjTΣ1μiμjTΣ1μj= \mu_i^T\Sigma^{-1}\mu_i - \mu_i^T\Sigma^{-1}\mu_j + \mu_j^T\Sigma^{-1}\mu_i - \mu_j^T\Sigma^{-1}\mu_j Since Σ\Sigma is a symmetric covariance matrix, its inverse Σ1\Sigma^{-1} is also symmetric. Therefore, the scalar value μiTΣ1μj\mu_i^T\Sigma^{-1}\mu_j is equal to its transpose μjTΣ1μi\mu_j^T\Sigma^{-1}\mu_i. This means the two middle terms cancel each other out: μiTΣ1μj+μjTΣ1μi=0-\mu_i^T\Sigma^{-1}\mu_j + \mu_j^T\Sigma^{-1}\mu_i = 0 Leaving us with: (μi+μj)TΣ1(μiμj)=μiTΣ1μiμjTΣ1μj(\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j) = \mu_i^T\Sigma^{-1}\mu_i - \mu_j^T\Sigma^{-1}\mu_j

  8. Final Substitution: Substitute the simplified logarithmic and quadratic terms back into the equation for bb: b=12(μi+μj)TΣ1(μiμj)+logπiπjb = -\frac{1}{2}(\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j) + \log \frac{\pi_i}{\pi_j} This matches the required expression for bb, completing the proof.