Skip to main content

Answer

Step-by-step Answer

  1. Decomposition: We decompose each weight θi\theta_i into two non-negative parts: θi=θi+θi\theta_i = \theta_i^+ - \theta_i^-, where θi+,θi0\theta_i^+, \theta_i^- \ge 0. Usually, we INTEND for θi+=max(0,θi)\theta_i^+ = \max(0, \theta_i) and θi=max(0,θi)\theta_i^- = \max(0, -\theta_i). In this case, at least one of them is zero.

  2. The Objective Difference: Eq (3.62) uses θi+θi|\theta_i^+ - \theta_i^-|. Eq (3.63) uses (θi++θi)(\theta_i^+ + \theta_i^-). Since θi+,θi0\theta_i^+, \theta_i^- \ge 0, we know that θi+θiθi++θi|\theta_i^+ - \theta_i^-| \le \theta_i^+ + \theta_i^-, with equality holding if and only if at least one of θi+\theta_i^+ or θi\theta_i^- is zero (i.e., θi+θi=0\theta_i^+ \cdot \theta_i^- = 0).

  3. Optimization Logic: Suppose we have a candidate solution where for some index ii, both θi+>0\theta_i^+ > 0 and θi>0\theta_i^- > 0. Let m=min(θi+,θi)>0m = \min(\theta_i^+, \theta_i^-) > 0. We can create a new solution: θ~i+=θi+m\tilde{\theta}_i^+ = \theta_i^+ - m θ~i=θim\tilde{\theta}_i^- = \theta_i^- - m The actual weight θi\theta_i remains unchanged: θ~i+θ~i=(θi+m)(θim)=θi+θi=θi\tilde{\theta}_i^+ - \tilde{\theta}_i^- = (\theta_i^+ - m) - (\theta_i^- - m) = \theta_i^+ - \theta_i^- = \theta_i. The first term (loss term) depends only on the difference, so it is unchanged.

    However, consider the penalty term in (3.63): Sum was (θi++θi)(\theta_i^+ + \theta_i^-). New sum is (θ~i++θ~i)=(θi+m+θim)=(θi++θi)2m(\tilde{\theta}_i^+ + \tilde{\theta}_i^-) = (\theta_i^+ - m + \theta_i^- - m) = (\theta_i^+ + \theta_i^-) - 2m. Since m>0m > 0, the new objective value is strictly smaller! Therefore, any solution where both are positive is not optimal. The optimizer will drive at least one of them to zero to minimize the objective.

  4. Conclusion: At the optimum, for every ii, either θi+=0\theta_i^+=0 or θi=0\theta_i^-=0 (or both). In this case, θi+θi=θi++θi|\theta_i^+ - \theta_i^-| = \theta_i^+ + \theta_i^-. Thus, minimizing (3.63) automatically leads to a solution satisfying this property, making it equivalent to minimalizing (3.62).