Skip to main content

Answer

Prerequisites

  • Hyperplane Equation: From part (b), the decision boundary is wTx+b=0w^T x + b = 0.
  • Mahalanobis Distance: The squared Mahalanobis distance between two vectors uu and vv with respect to covariance matrix Σ\Sigma is defined as uvΣ2=(uv)TΣ1(uv)\|u - v\|_\Sigma^2 = (u - v)^T \Sigma^{-1} (u - v).
  • Vector Transpose Properties: For a scalar value cc resulting from a vector product uTMvu^T M v, its transpose is equal to itself: c=cT=vTMTuc = c^T = v^T M^T u. If MM is symmetric (like Σ1\Sigma^{-1}), then uTMv=vTMuu^T M v = v^T M u.

Step-by-Step Derivation

  1. Target Form: We want to rewrite the hyperplane equation wTx+b=0w^T x + b = 0 into the form wT(xx0)=0w^T(x - x_0) = 0. Expanding the target form gives: wTxwTx0=0w^T x - w^T x_0 = 0 Comparing this with wTx+b=0w^T x + b = 0, we can see that we must satisfy the condition: wTx0=bw^T x_0 = -b

  2. Substitute Knowns: From part (b), we have: w=Σ1(μiμj)w = \Sigma^{-1}(\mu_i - \mu_j) b=12(μi+μj)TΣ1(μiμj)logπiπj-b = \frac{1}{2}(\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j) - \log \frac{\pi_i}{\pi_j} We are given the proposed expression for x0x_0: x0=μi+μj2(μiμj)μiμjΣ2logπiπjx_0 = \frac{\mu_i + \mu_j}{2} - \frac{(\mu_i - \mu_j)}{\|\mu_i - \mu_j\|_\Sigma^2} \log \frac{\pi_i}{\pi_j}

  3. Evaluate wTx0w^T x_0: Let's calculate wTx0w^T x_0 using the given definitions and show it equals b-b. wTx0=(Σ1(μiμj))T[μi+μj2(μiμj)μiμjΣ2logπiπj]w^T x_0 = \left( \Sigma^{-1}(\mu_i - \mu_j) \right)^T \left[ \frac{\mu_i + \mu_j}{2} - \frac{(\mu_i - \mu_j)}{\|\mu_i - \mu_j\|_\Sigma^2} \log \frac{\pi_i}{\pi_j} \right] Since Σ1\Sigma^{-1} is symmetric, (Σ1(μiμj))T=(μiμj)T(Σ1)T=(μiμj)TΣ1(\Sigma^{-1}(\mu_i - \mu_j))^T = (\mu_i - \mu_j)^T (\Sigma^{-1})^T = (\mu_i - \mu_j)^T \Sigma^{-1}. wTx0=(μiμj)TΣ1[μi+μj2(μiμj)μiμjΣ2logπiπj]w^T x_0 = (\mu_i - \mu_j)^T \Sigma^{-1} \left[ \frac{\mu_i + \mu_j}{2} - \frac{(\mu_i - \mu_j)}{\|\mu_i - \mu_j\|_\Sigma^2} \log \frac{\pi_i}{\pi_j} \right]

  4. Distribute the Terms: Multiply (μiμj)TΣ1(\mu_i - \mu_j)^T \Sigma^{-1} into the brackets: wTx0=12(μiμj)TΣ1(μi+μj)(μiμj)TΣ1(μiμj)μiμjΣ2logπiπjw^T x_0 = \frac{1}{2} (\mu_i - \mu_j)^T \Sigma^{-1} (\mu_i + \mu_j) - \frac{(\mu_i - \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j)}{\|\mu_i - \mu_j\|_\Sigma^2} \log \frac{\pi_i}{\pi_j}

  5. Simplify the Expression:

    • First term: Notice that (μiμj)TΣ1(μi+μj)(\mu_i - \mu_j)^T \Sigma^{-1} (\mu_i + \mu_j) is a scalar. Its transpose is (μi+μj)T(Σ1)T(μiμj)=(μi+μj)TΣ1(μiμj)(\mu_i + \mu_j)^T (\Sigma^{-1})^T (\mu_i - \mu_j) = (\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j). Since a scalar equals its transpose, we can rewrite the first term as 12(μi+μj)TΣ1(μiμj)\frac{1}{2} (\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j).
    • Second term: By definition, the numerator (μiμj)TΣ1(μiμj)(\mu_i - \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j) is exactly the squared Mahalanobis distance μiμjΣ2\|\mu_i - \mu_j\|_\Sigma^2. Therefore, the fraction cancels out to 1.

    Substituting these simplifications back: wTx0=12(μi+μj)TΣ1(μiμj)logπiπjw^T x_0 = \frac{1}{2} (\mu_i + \mu_j)^T \Sigma^{-1} (\mu_i - \mu_j) - \log \frac{\pi_i}{\pi_j}

  6. Conclusion: We have shown that wTx0=bw^T x_0 = -b. Therefore, the equation wTx+b=0w^T x + b = 0 is perfectly equivalent to wTxwTx0=0w^T x - w^T x_0 = 0, which is wT(xx0)=0w^T(x - x_0) = 0.

Interpretation

  • Interpretation of ww: The vector w=Σ1(μiμj)w = \Sigma^{-1}(\mu_i - \mu_j) is the normal vector to the decision hyperplane. It determines the orientation (tilt) of the boundary. It points in the general direction from the mean of class jj to the mean of class ii, but is skewed by the inverse covariance matrix Σ1\Sigma^{-1} to account for the shape and spread of the data distribution.
  • Interpretation of x0x_0: The point x0x_0 is a specific point that lies exactly on the decision hyperplane (since wT(x0x0)=0w^T(x_0 - x_0) = 0). It acts as an anchor point or origin for the boundary.
  • Effect of the priors {πi,πj}\{\pi_i, \pi_j\} on x0x_0: The formula for x0x_0 consists of two parts: a midpoint μi+μj2\frac{\mu_i + \mu_j}{2} and a shift term.
    • If the classes are equally probable (πi=πj\pi_i = \pi_j), then log(πi/πj)=log(1)=0\log(\pi_i/\pi_j) = \log(1) = 0. The shift term disappears, and x0x_0 is exactly halfway between the two class means.
    • If class ii is more probable (πi>πj\pi_i > \pi_j), then log(πi/πj)>0\log(\pi_i/\pi_j) > 0. The shift term subtracts a vector pointing from μj\mu_j to μi\mu_i. This moves the anchor point x0x_0 away from μi\mu_i and towards μj\mu_j. Geometrically, this shifts the entire decision boundary towards the less probable class jj, thereby expanding the decision region assigned to the more probable class ii.