A Fully Single Loop Algorithm for Bilevel Optimization without Hessian
Inverse
- URL: http://arxiv.org/abs/2112.04660v2
- Date: Fri, 10 Dec 2021 14:35:02 GMT
- Title: A Fully Single Loop Algorithm for Bilevel Optimization without Hessian
Inverse
- Authors: Junyi Li, Bin Gu, Heng Huang
- Abstract summary: We propose a new Hessian inverse free Fully Single Loop Algorithm for bilevel optimization problems.
We show that our algorithm converges with the rate of $O(epsilon-2)$.
- Score: 121.54116938140754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a new Hessian inverse free Fully Single Loop
Algorithm (FSLA) for bilevel optimization problems. Classic algorithms for
bilevel optimization admit a double loop structure which is computationally
expensive. Recently, several single loop algorithms have been proposed with
optimizing the inner and outer variable alternatively. However, these
algorithms not yet achieve fully single loop. As they overlook the loop needed
to evaluate the hyper-gradient for a given inner and outer state. In order to
develop a fully single loop algorithm, we first study the structure of the
hyper-gradient and identify a general approximation formulation of
hyper-gradient computation that encompasses several previous common approaches,
e.g. back-propagation through time, conjugate gradient, \emph{etc.} Based on
this formulation, we introduce a new state variable to maintain the historical
hyper-gradient information. Combining our new formulation with the alternative
update of the inner and outer variables, we propose an efficient fully single
loop algorithm. We theoretically show that the error generated by the new state
can be bounded and our algorithm converges with the rate of $O(\epsilon^{-2})$.
Finally, we verify the efficacy our algorithm empirically through multiple
bilevel optimization based machine learning tasks.
Related papers
- Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - A Single-Loop Algorithm for Decentralized Bilevel Optimization [11.67135350286933]
We propose a novel single-loop algorithm for solving decentralized bilevel optimization with a strongly convex lower-level problem.
Our approach is a fully single-loop method that approximates the hypergradient using only two matrix-vector multiplications per iteration.
Our analysis demonstrates that the proposed algorithm achieves the best-known convergence rate for bilevel optimization algorithms.
arXiv Detail & Related papers (2023-11-15T13:29:49Z) - Will Bilevel Optimizers Benefit from Loops [63.22466953441521]
Two current popular bilevelimats AID-BiO and ITD-BiO naturally involve solving one or two sub-problems.
We first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops.
arXiv Detail & Related papers (2022-05-27T20:28:52Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Optimal and Practical Algorithms for Smooth and Strongly Convex
Decentralized Optimization [21.555331273873175]
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes of a network.
We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
arXiv Detail & Related papers (2020-06-21T11:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.