-
Decomposition:
We decompose each weight θi into two non-negative parts: θi=θi+−θi−, where θi+,θi−≥0.
Usually, we INTEND for θi+=max(0,θi) and θi−=max(0,−θi). In this case, at least one of them is zero.
-
The Objective Difference:
Eq (3.62) uses ∣θi+−θi−∣.
Eq (3.63) uses (θi++θi−).
Since θi+,θi−≥0, we know that ∣θi+−θi−∣≤θi++θi−, with equality holding if and only if at least one of θi+ or θi− is zero (i.e., θi+⋅θi−=0).
-
Optimization Logic:
Suppose we have a candidate solution where for some index i, both θi+>0 and θi−>0.
Let m=min(θi+,θi−)>0.
We can create a new solution:
θ~i+=θi+−m
θ~i−=θi−−m
The actual weight θi remains unchanged: θ~i+−θ~i−=(θi+−m)−(θi−−m)=θi+−θi−=θi.
The first term (loss term) depends only on the difference, so it is unchanged.
However, consider the penalty term in (3.63):
Sum was (θi++θi−).
New sum is (θ~i++θ~i−)=(θi+−m+θi−−m)=(θi++θi−)−2m.
Since m>0, the new objective value is strictly smaller!
Therefore, any solution where both are positive is not optimal. The optimizer will drive at least one of them to zero to minimize the objective.
-
Conclusion:
At the optimum, for every i, either θi+=0 or θi−=0 (or both).
In this case, ∣θi+−θi−∣=θi++θi−.
Thus, minimizing (3.63) automatically leads to a solution satisfying this property, making it equivalent to minimalizing (3.62).