Skip to main content

Explain

Intuition

Imagine you are trying to classify a new data point xx into one of several categories. To make the best decision, you want to calculate a "score" for each category and simply pick the one with the highest score.

This problem shows that when all categories have data that is spread out in the exact same way (they share the same "shape" or covariance matrix Σ\Sigma), calculating this score becomes surprisingly simple!

Instead of dealing with the complex, bell-shaped curve formula of the Gaussian distribution, the math simplifies down to a basic straight-line equation: gj(x)=wjTx+bjg_j(x) = w_j^T x + b_j

Here is what the two parts of this simple equation mean:

  1. The Weight (wjw_j): Think of wjw_j as a "template" for class jj. It points in the direction of the center of class jj (μj\mu_j), but it's adjusted by the shape of the data (Σ1\Sigma^{-1}). When you multiply wjTxw_j^T x, you are basically measuring how well your new data point xx aligns with this template. The closer xx is to the center of class jj, the higher this part of the score will be.
  2. The Bias (bjb_j): This is a baseline adjustment for the score. It does two things:
    • Distance Penalty: The term 12μjTΣ1μj-\frac{1}{2} \mu_j^T \Sigma^{-1} \mu_j penalizes classes whose centers are very far away from the origin.
    • Popularity Bonus: The term logπj\log \pi_j gives a boost to classes that are more common overall (higher prior probability πj\pi_j). If you are unsure, it's safer to guess the more popular class!

In summary, because the "shape" of the data is the same for all classes, the complex Gaussian math cancels out, leaving us with a simple, elegant linear scoring system.