The Positive-Negative Decomposition Trick
When optimizing mathematically, absolute value functions terms like are incredibly annoying because they have a "V" shape—they abruptly change direction at zero, meaning they are not differentiable there. Most standard, fast optimizers require smooth, differentiable curves.
Overcoming the Absolute Value
How do we remove the while preserving the problem? We split any real number into two positive components: its "positive part" () and its "negative part" ().
- If , then and .
- If , then and .
Notice how in both of these ideal cases:
This transformation perfectly replaces absolute values with pure addition, but it only works if at least one of the variables is zero.
If we cheat and say to represent , then , which does not equal .
The Optimizer's Greed
Why are we guaranteed that the optimizer won't cheat?
Because the optimizer is trying to find the minimum cost! The term acts as a penalty. If the optimizer picked , it pays a massive penalty of .
If it shifts those down by units each to , the exact same prediction is made (since ), but the penalty drops strictly to . The optimizer is "lazy" and greedy; it will never pay when it can pay for the exact same predictive behavior.
Thus, the optimizer mathematically guarantees that one of the two parts converges strictly to zero, effectively replacing the jagged absolute value plot with mutually exclusive positive lines.