BiAdam: Fast Adaptive Bilevel Optimization Methods
- URL: http://arxiv.org/abs/2106.11396v1
- Date: Mon, 21 Jun 2021 20:16:40 GMT
- Title: BiAdam: Fast Adaptive Bilevel Optimization Methods
- Authors: Feihu Huang and Heng Huang
- Abstract summary: Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
- Score: 104.96004056928474
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Bilevel optimization recently has attracted increased interest in machine
learning due to its many applications such as hyper-parameter optimization and
policy optimization. Although some methods recently have been proposed to solve
the bilevel problems, these methods do not consider using adaptive learning
rates. To fill this gap, in the paper, we propose a class of fast and effective
adaptive methods for solving bilevel optimization problems that the outer
problem is possibly nonconvex and the inner problem is strongly-convex.
Specifically, we propose a fast single-loop BiAdam algorithm based on the basic
momentum technique, which achieves a sample complexity of
$\tilde{O}(\epsilon^{-4})$ for finding an $\epsilon$-stationary point. At the
same time, we propose an accelerated version of BiAdam algorithm (VR-BiAdam) by
using variance reduced technique, which reaches the best known sample
complexity of $\tilde{O}(\epsilon^{-3})$. To further reduce computation in
estimating derivatives, we propose a fast single-loop stochastic approximated
BiAdam algorithm (saBiAdam) by avoiding the Hessian inverse, which still
achieves a sample complexity of $\tilde{O}(\epsilon^{-4})$ without large
batches. We further present an accelerated version of saBiAdam algorithm
(VR-saBiAdam), which also reaches the best known sample complexity of
$\tilde{O}(\epsilon^{-3})$. We apply the unified adaptive matrices to our
methods as the SUPER-ADAM \citep{huang2021super}, which including many types of
adaptive learning rates. Moreover, our framework can flexibly use the momentum
and variance reduced techniques. In particular, we provide a useful convergence
analysis framework for both the constrained and unconstrained bilevel
optimization. To the best of our knowledge, we first study the adaptive bilevel
optimization methods with adaptive learning rates.
Related papers
- Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - Accelerated Gradient Algorithms with Adaptive Subspace Search for
Instance-Faster Optimization [6.896308995955336]
Gradient-based minimax optimal algorithms have promoted the development of continuous optimization and machine learning.
In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning.
arXiv Detail & Related papers (2023-12-06T01:16:10Z) - Adaptive Mirror Descent Bilevel Optimization [25.438298531555468]
We propose a class of efficient adaptive bilevel methods based on mirror descent for non bifraclevel optimization.
We provide an analysis for methods under some conditions, and prove that our methods have a fast number of iterations.
arXiv Detail & Related papers (2023-11-08T08:17:09Z) - Fast Adaptive Federated Bilevel Optimization [14.579475552088692]
We propose a novel adaptive federated bilevel optimization algorithm (i.e.,AdaFBiO) to solve the distributed bilevel optimization problems.
AdaFBiO uses the unified adaptive matrices to flexibly incorporate various adaptive learning rates to update variables in both UL and LL problems.
We provide a convergence analysis framework for our AdaFBiO algorithm, and prove it needs the sample of complexity of $tildeO(epsilon-3)$ with communication complexity of $tildeO(epsilon-2)$ to obtain an $
arXiv Detail & Related papers (2022-11-02T13:55:47Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems.
We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.