A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization
- URL: http://arxiv.org/abs/2102.07367v1
- Date: Mon, 15 Feb 2021 07:10:33 GMT
- Title: A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization
- Authors: Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang
and Zhuoran Yang
- Abstract summary: We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
- Score: 112.59170319105971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new algorithm -- the Momentum-assisted Single-timescale
Stochastic Approximation (MSTSA) -- for tackling unconstrained bilevel
optimization problems. We focus on bilevel problems where the lower level
subproblem is strongly-convex. Unlike prior works which rely on two timescale
or double loop techniques that track the optimal solution to the lower level
subproblem, we design a stochastic momentum assisted gradient estimator for the
upper level subproblem's updates. The latter allows us to gradually control the
error in stochastic gradient updates due to inaccurate solution to the lower
level subproblem. We show that if the upper objective function is smooth but
possibly non-convex (resp. strongly-convex), MSTSA requires
$\mathcal{O}(\epsilon^{-2})$ (resp. $\mathcal{O}(\epsilon^{-1})$) iterations
(each using constant samples) to find an $\epsilon$-stationary (resp.
$\epsilon$-optimal) solution. This achieves the best-known guarantees for
stochastic bilevel problems. We validate our theoretical results by showing the
efficiency of the MSTSA algorithm on hyperparameter optimization and data
hyper-cleaning problems.
Related papers
- An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Strictly Low Rank Constraint Optimization -- An Asymptotically
$\mathcal{O}(\frac{1}{t^2})$ Method [5.770309971945476]
We propose a class of non-text and non-smooth problems with textitrank regularization to promote sparsity in optimal solution.
We show that our algorithms are able to achieve a singular convergence of $Ofrac(t2)$, which is exactly same as Nesterov's optimal convergence for first-order methods on smooth convex problems.
arXiv Detail & Related papers (2023-07-04T16:55:41Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z) - A Single-Timescale Stochastic Bilevel Optimization Method [34.868681953242934]
This paper develops a new method for a class of bilevel problems that we term Single-Timescale stochAstic BiEl optimization (STABLE) method.
To achieve an $epsilon$-stationary point of the bilevel problem, STABLE requires $cal O(epsilon-2)$ samples in total; and to achieve an $epsilon$-optimal solution in the strongly convex case, STABLE requires $cal O(epsilon-1)$ samples.
arXiv Detail & Related papers (2021-02-09T06:35:30Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.