The Curse of Unrolling: Rate of Differentiating Through Optimization
- URL: http://arxiv.org/abs/2209.13271v3
- Date: Fri, 25 Aug 2023 16:24:33 GMT
- Title: The Curse of Unrolling: Rate of Differentiating Through Optimization
- Authors: Damien Scieur, Quentin Bertrand, Gauthier Gidel, Fabian Pedregosa
- Abstract summary: Un differentiation approximates the solution using an iterative solver and differentiates it through the computational path.
We show that we can either 1) choose a large learning rate leading to a fast convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence.
- Score: 35.327233435055305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computing the Jacobian of the solution of an optimization problem is a
central problem in machine learning, with applications in hyperparameter
optimization, meta-learning, optimization as a layer, and dataset distillation,
to name a few. Unrolled differentiation is a popular heuristic that
approximates the solution using an iterative solver and differentiates it
through the computational path. This work provides a non-asymptotic
convergence-rate analysis of this approach on quadratic objectives for gradient
descent and the Chebyshev method. We show that to ensure convergence of the
Jacobian, we can either 1) choose a large learning rate leading to a fast
asymptotic convergence but accept that the algorithm may have an arbitrarily
long burn-in phase or 2) choose a smaller learning rate leading to an immediate
but slower convergence. We refer to this phenomenon as the curse of unrolling.
Finally, we discuss open problems relative to this approach, such as deriving a
practical update rule for the optimal unrolling strategy and making novel
connections with the field of Sobolev orthogonal polynomials.
Related papers
- Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Convergence Rates of Two-Time-Scale Gradient Descent-Ascent Dynamics for
Solving Nonconvex Min-Max Problems [2.0305676256390934]
We characterize the finite-time performance of the continuous-time variant of simultaneous gradient descent-ascent algorithm.
Our results on the behavior of continuous-time algorithm may be used to enhance the convergence properties of its discrete-time counterpart.
arXiv Detail & Related papers (2021-12-17T15:51:04Z) - A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems.
Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm.
We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Divide and Learn: A Divide and Conquer Approach for Predict+Optimize [50.03608569227359]
The predict+optimize problem combines machine learning ofproblem coefficients with a optimization prob-lem that uses the predicted coefficients.
We show how to directlyexpress the loss of the optimization problem in terms of thepredicted coefficients as a piece-wise linear function.
We propose a novel divide and algorithm to tackle optimization problems without this restriction and predict itscoefficients using the optimization loss.
arXiv Detail & Related papers (2020-12-04T00:26:56Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic
Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems.
Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.