AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
Optimization
- URL: http://arxiv.org/abs/2106.16101v2
- Date: Thu, 1 Jul 2021 18:50:03 GMT
- Title: AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
Optimization
- Authors: Feihu Huang and Heng Huang
- Abstract summary: We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems.
We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
- Score: 104.96004056928474
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the paper, we propose a class of faster adaptive gradient descent ascent
methods for solving the nonconvex-strongly-concave minimax problems by using
unified adaptive matrices used in the SUPER-ADAM \citep{huang2021super}.
Specifically, we propose a fast adaptive gradient decent ascent (AdaGDA) method
based on the basic momentum technique, which reaches a low sample complexity of
$O(\kappa^4\epsilon^{-4})$ for finding an $\epsilon$-stationary point without
large batches, which improves the existing result of adaptive minimax
optimization method by a factor of $O(\sqrt{\kappa})$. Moreover, we present an
accelerated version of AdaGDA (VR-AdaGDA) method based on the momentum-based
variance reduced technique, which achieves the best known sample complexity of
$O(\kappa^3\epsilon^{-3})$ for finding an $\epsilon$-stationary point without
large batches. Further assume the bounded Lipschitz parameter of objective
function, we prove that our VR-AdaGDA method reaches a lower sample complexity
of $O(\kappa^{2.5}\epsilon^{-3})$ with the mini-batch size $O(\kappa)$. In
particular, we provide an effective convergence analysis framework for our
adaptive methods based on unified adaptive matrices, which include almost
existing adaptive learning rates.
Related papers
- Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax
Optimization [14.579475552088692]
We prove that framework $aknabla F(x) max_y f(x,y) can be used to find an $ sample.
We present an effective analysis for our methods.
arXiv Detail & Related papers (2023-03-07T15:33:12Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - On Riemannian Gradient-Based Methods for Minimax Problems [24.199289678553896]
We propose a class of Riemanian-based methods to solve minimax problems.
We prove that our RGDA has a sample complexity of $tildeO(kappa4eps-3)$.
We also show that our Acc-RSGDA achieves a lower sample complexity on $tildeO(kappa4eps-3)$.
arXiv Detail & Related papers (2020-10-13T00:54:00Z) - Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to
Minimax Optimization [133.53164856723782]
We propose a new accelerated zeroth-order momentum (AccZOM) method for black-box mini-optimization where only function values can be obtained.
Meanwhile, we propose an accelerated zeroth-order momentum descent (Acc-MDA) method for black-box minimax optimization, where only function values can be obtained.
In particular, our Acc-MDA can obtain a lower gradient complexity of $tildeO(kappa_y2.5epsilon-3)$ with a batch size $O(kappa_y4)$.
arXiv Detail & Related papers (2020-08-18T22:19:29Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.