TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax
Optimization
- URL: http://arxiv.org/abs/2210.17478v1
- Date: Mon, 31 Oct 2022 17:05:36 GMT
- Title: TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax
Optimization
- Authors: Xiang Li, Junchi Yang, Niao He
- Abstract summary: Adaptive methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner.
Current convergence analyses of gradient ascent for non-concave minimax problems require careful tuning of parameters.
- Score: 24.784754071913255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive gradient methods have shown their ability to adjust the stepsizes on
the fly in a parameter-agnostic manner, and empirically achieve faster
convergence for solving minimization problems. When it comes to nonconvex
minimax optimization, however, current convergence analyses of gradient descent
ascent (GDA) combined with adaptive stepsizes require careful tuning of
hyper-parameters and the knowledge of problem-dependent parameters. Such a
discrepancy arises from the primal-dual nature of minimax problems and the
necessity of delicate time-scale separation between the primal and dual updates
in attaining convergence. In this work, we propose a single-loop adaptive GDA
algorithm called TiAda for nonconvex minimax optimization that automatically
adapts to the time-scale separation. Our algorithm is fully parameter-agnostic
and can achieve near-optimal complexities simultaneously in deterministic and
stochastic settings of nonconvex-strongly-concave minimax problems. The
effectiveness of the proposed method is further justified numerically for a
number of machine learning applications.
Related papers
- On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms [4.307128674848627]
AdaPG$q,r$ is a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds.
Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations.
arXiv Detail & Related papers (2023-11-30T10:29:43Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Ascent.
We show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings.
We extend our algorithm to extend the setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings.
arXiv Detail & Related papers (2022-10-31T17:59:29Z) - Near-Optimal Algorithms for Making the Gradient Small in Stochastic
Minimax Optimization [14.719077076351377]
We study the problem of finding a near-stationary point for smooth minimax optimization.
We show that a novel algorithm called Recursive IteratioNRAIN achieves both convex-concave and strongly-concave cases.
arXiv Detail & Related papers (2022-08-11T16:55:26Z) - Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax
Optimization [24.784754071913255]
Adaptive algorithms like AdaGrad and AMS are successful in nonspecific parameters robustness.
We show that NeAda can achieve the near-optimal level of knowledge.
arXiv Detail & Related papers (2022-06-01T20:11:05Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.