Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax
Optimization
- URL: http://arxiv.org/abs/2206.00743v1
- Date: Wed, 1 Jun 2022 20:11:05 GMT
- Title: Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax
Optimization
- Authors: Junchi Yang, Xiang Li, Niao He
- Abstract summary: Adaptive algorithms like AdaGrad and AMS are successful in nonspecific parameters robustness.
We show that NeAda can achieve the near-optimal level of knowledge.
- Score: 24.784754071913255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex
optimization owing to their parameter-agnostic ability -- requiring no a priori
knowledge about problem-specific parameters nor tuning of learning rates.
However, when it comes to nonconvex minimax optimization, direct extensions of
such adaptive optimizers without proper time-scale separation may fail to work
in practice. We provide such an example proving that the simple combination of
Gradient Descent Ascent (GDA) with adaptive stepsizes can diverge if the
primal-dual stepsize ratio is not carefully chosen; hence, a fortiori, such
adaptive extensions are not parameter-agnostic. To address the issue, we
formally introduce a Nested Adaptive framework, NeAda for short, that carries
an inner loop for adaptively maximizing the dual variable with controllable
stopping criteria and an outer loop for adaptively minimizing the primal
variable. Such mechanism can be equipped with off-the-shelf adaptive optimizers
and automatically balance the progress in the primal and dual variables.
Theoretically, for nonconvex-strongly-concave minimax problems, we show that
NeAda can achieve the near-optimal $\tilde{O}(\epsilon^{-2})$ and
$\tilde{O}(\epsilon^{-4})$ gradient complexities respectively in the
deterministic and stochastic settings, without prior information on the
problem's smoothness and strong concavity parameters. To the best of our
knowledge, this is the first algorithm that simultaneously achieves
near-optimal convergence rates and parameter-agnostic adaptation in the
nonconvex minimax setting. Numerically, we further illustrate the robustness of
the NeAda family with experiments on simple test functions and a real-world
application.
Related papers
- The Price of Adaptivity in Stochastic Convex Optimization [23.776027867314628]
We prove results for adaptivity in non-smooth convex optimization.
We define a "price of adaptivity" (PoA) that measures the multiplicative increase in suboptimality.
arXiv Detail & Related papers (2024-02-16T18:56:41Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Ascent.
We show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings.
We extend our algorithm to extend the setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings.
arXiv Detail & Related papers (2022-10-31T17:59:29Z) - TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax
Optimization [24.784754071913255]
Adaptive methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner.
Current convergence analyses of gradient ascent for non-concave minimax problems require careful tuning of parameters.
arXiv Detail & Related papers (2022-10-31T17:05:36Z) - Adaptive first-order methods revisited: Convex optimization without
Lipschitz requirements [34.70493893170415]
We propose a new family of adaptive first-order methods for convex minimization problems that fail to be Lipschitz continuous or smooth.
The proposed method - which we call adaptive mirror descent (AdaMir) - aims to close this gap by concurrently achieving min-max optimal rates in problems that are relatively continuous or smooth.
arXiv Detail & Related papers (2021-07-16T16:59:40Z) - AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems.
We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z) - Adaptive Gradient Methods for Constrained Convex Optimization and
Variational Inequalities [32.51470158863247]
Main algorithms AdaACSA and AdaAGD+ are accelerated methods for constrained convex optimization.
We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate.
arXiv Detail & Related papers (2020-07-17T09:10:21Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.