Don't Fix What ain't Broke: Near-optimal Local Convergence of
Alternating Gradient Descent-Ascent for Minimax Optimization
- URL: http://arxiv.org/abs/2102.09468v1
- Date: Thu, 18 Feb 2021 16:39:35 GMT
- Title: Don't Fix What ain't Broke: Near-optimal Local Convergence of
Alternating Gradient Descent-Ascent for Minimax Optimization
- Authors: Guodong Zhang, Yuanhao Wang, Laurent Lessard, Roger Grosse
- Abstract summary: alternating descentascent (Alt-GDA) is superior to its simultaneous counterpart (Sim-GDA) in many settings.
We show that Alt-GDA achieves a near-optimal local convergence rate for strongly-concave problems.
In particular, we prove that Alt-GDA achieves a near-optimal local convergence rate for strongly-concave problems.
- Score: 24.236441934607154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimax optimization has recently gained a lot of attention as adversarial
architectures and algorithms proliferate. Often, smooth minimax games proceed
by simultaneous or alternating gradient updates. Although algorithms with
alternating updates are commonly used in practice for many applications (e.g.,
GAN training), the majority of existing theoretical analyses focus on
simultaneous algorithms. In this paper, we study alternating gradient
descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to
its simultaneous counterpart (Sim-GDA) in many settings. In particular, we
prove that Alt-GDA achieves a near-optimal local convergence rate for
strongly-convex strongly-concave problems while Sim-GDA converges with a much
slower rate. Moreover, we show that the acceleration effect of alternating
updates remains when the minimax problem has only strong concavity in the dual
variables. Numerical experiments on quadratic minimax games validate our
claims. Additionally, we demonstrate that alternating updates speed up GAN
training significantly and the use of optimism only helps for simultaneous
algorithms.
Related papers
- Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Fundamental Benefit of Alternating Updates in Minimax Optimization [15.170534286366744]
The Gradient Descent-Ascent gradients (GDA) algorithm is designed to solve minimax optimization problems.
While Alt-GDA is commonly observed to converge iteration faster, the performance gap between the two is not yet well understood.
We present fine-grained convergence analyses of both algorithms for strongly-strongly-concave and Lipschitz-gradient objectives.
arXiv Detail & Related papers (2024-02-16T06:41:35Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - Min-Max Optimization under Delays [26.830212508878162]
Delays and asynchrony are inevitable in large-scale machine-learning problems.
No analogous theory is available for min-max optimization.
We show that even small delays can cause prominent algorithms like Extra-gradient to diverge.
arXiv Detail & Related papers (2023-07-13T16:39:01Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Ascent.
We show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings.
We extend our algorithm to extend the setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings.
arXiv Detail & Related papers (2022-10-31T17:59:29Z) - HessianFR: An Efficient Hessian-based Follow-the-Ridge Algorithm for
Minimax Optimization [18.61046317693516]
HessianFR is an efficient Hessian-based Follow-the-Ridge algorithm with theoretical guarantees.
Experiments of training generative adversarial networks (GANs) have been conducted on both synthetic and real-world large-scale image datasets.
arXiv Detail & Related papers (2022-05-23T04:28:52Z) - Federated Minimax Optimization: Improved Convergence Analyses and
Algorithms [32.062312674333775]
We consider non minimax optimization, is gaining prominence many modern machine learning applications such as GANs.
We provide a novel and tighter analysis algorithm, improves convergence communication guarantees in the existing literature.
arXiv Detail & Related papers (2022-03-09T16:21:31Z) - Solve Minimax Optimization by Anderson Acceleration [5.900781483345349]
Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity.
We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration.
We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions.
arXiv Detail & Related papers (2021-10-06T02:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.