Fundamental Benefit of Alternating Updates in Minimax Optimization
- URL: http://arxiv.org/abs/2402.10475v2
- Date: Mon, 15 Jul 2024 08:21:28 GMT
- Title: Fundamental Benefit of Alternating Updates in Minimax Optimization
- Authors: Jaewook Lee, Hanseul Cho, Chulhee Yun,
- Abstract summary: The Gradient Descent-Ascent gradients (GDA) algorithm is designed to solve minimax optimization problems.
While Alt-GDA is commonly observed to converge iteration faster, the performance gap between the two is not yet well understood.
We present fine-grained convergence analyses of both algorithms for strongly-strongly-concave and Lipschitz-gradient objectives.
- Score: 15.170534286366744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice gap, we present fine-grained convergence analyses of both algorithms for strongly-convex-strongly-concave and Lipschitz-gradient objectives. Our new iteration complexity upper bound of Alt-GDA is strictly smaller than the lower bound of Sim-GDA; i.e., Alt-GDA is provably faster. Moreover, we propose Alternating-Extrapolation GDA (Alex-GDA), a general algorithmic framework that subsumes Sim-GDA and Alt-GDA, for which the main idea is to alternately take gradients from extrapolations of the iterates. We show that Alex-GDA satisfies a smaller iteration complexity bound, identical to that of the Extra-gradient method, while requiring less gradient computations. We also prove that Alex-GDA enjoys linear convergence for bilinear problems, for which both Sim-GDA and Alt-GDA fail to converge at all.
Related papers
- Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Randomized Stochastic Gradient Descent Ascent [37.887266927498395]
An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function.
We propose RSGDA (Randomized SGD), a variant of ESGDA with loop size with a simpler theoretical analysis.
arXiv Detail & Related papers (2021-11-25T16:44:19Z) - Escaping Saddle Points in Nonconvex Minimax Optimization via
Cubic-Regularized Gradient Descent-Ascent [13.565010494569243]
The gradient descent-ascent (GDA) algorithm has been widely applied to solve non minimax optimization problems.
We develop the first-type GDA algorithm for escaping points in non-strongly-concave minimax optimization.
arXiv Detail & Related papers (2021-10-14T00:36:44Z) - Solve Minimax Optimization by Anderson Acceleration [5.900781483345349]
Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity.
We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration.
We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions.
arXiv Detail & Related papers (2021-10-06T02:08:54Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Don't Fix What ain't Broke: Near-optimal Local Convergence of
Alternating Gradient Descent-Ascent for Minimax Optimization [24.236441934607154]
alternating descentascent (Alt-GDA) is superior to its simultaneous counterpart (Sim-GDA) in many settings.
We show that Alt-GDA achieves a near-optimal local convergence rate for strongly-concave problems.
In particular, we prove that Alt-GDA achieves a near-optimal local convergence rate for strongly-concave problems.
arXiv Detail & Related papers (2021-02-18T16:39:35Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - Global Convergence and Variance-Reduced Optimization for a Class of
Nonconvex-Nonconcave Minimax Problems [39.13924898972611]
Non minimaxewicz problems appear frequently in emerging machine learning applications generative adversarial networks and adversarial learning.
GDA algorithms with constant size can potentially diverge even in the convex setting.
AGDA algorithm converges globally at a rate that attains a sub rate.
arXiv Detail & Related papers (2020-02-22T04:20:37Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.