Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods
- URL: http://arxiv.org/abs/2202.07262v1
- Date: Tue, 15 Feb 2022 09:17:39 GMT
- Title: Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods
- Authors: Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, Nicolas Loizou
- Abstract summary: Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP)
In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods.
We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
- Score: 73.35353358543507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent
algorithms for solving min-max optimization and variational inequalities
problems (VIP) appearing in various machine learning tasks. The success of the
method led to several advanced extensions of the classical SGDA, including
variants with arbitrary sampling, variance reduction, coordinate randomization,
and distributed variants with compression, which were extensively studied in
the literature, especially during the last few years. In this paper, we propose
a unified convergence analysis that covers a large variety of stochastic
gradient descent-ascent methods, which so far have required different
intuitions, have different applications and have been developed separately in
various communities. A key to our unified framework is a parametric assumption
on the stochastic estimates. Via our general theoretical framework, we either
recover the sharpest known rates for the known special cases or tighten them.
Moreover, to illustrate the flexibility of our approach we develop several new
variants of SGDA such as a new variance-reduced method (L-SVRGDA), new
distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a
new method with coordinate randomization (SEGA-SGDA). Although variants of the
new methods are known for solving minimization problems, they were never
considered or analyzed for solving min-max problems and VIPs. We also
demonstrate the most important properties of the new methods through extensive
numerical experiments.
Related papers
- High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees.
Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z) - Stochastic Methods in Variational Inequalities: Ergodicity, Bias and
Refinements [19.524063429548278]
Extragradient (SEG) and Gradient Descent Ascent (SGDA) are preeminent algorithms for min-max optimization and variational inequalities problems.
Our work endeavors to quantify and quantify the intrinsic structures intrinsic to these algorithms.
By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem.
arXiv Detail & Related papers (2023-06-28T18:50:07Z) - Sublinear Convergence Rates of Extragradient-Type Methods: A Survey on
Classical and Recent Developments [12.995632804090198]
The extragradient (EG) is a well-known method to approximate solutions of saddle-point problems.
Recently, these methods have gained popularity due to new applications in machine learning and robust optimization.
We provide a unified convergence analysis for different classes of algorithms, with an emphasis on sublinear best-iterate and last-iterate convergence rates.
arXiv Detail & Related papers (2023-03-30T07:04:22Z) - Single-Call Stochastic Extragradient Methods for Structured Non-monotone
Variational Inequalities: Improved Analysis under Weaker Conditions [21.681130192954583]
Single-call extragradient methods are one of the most efficient algorithms for solving large-scale min-max optimization problems.
We provide convergence guarantees for two classes of VIPs: (i) quasi-strongly monotone problems (a generalization of strongly monotone problems) and (ii) weak Minty variational inequalities (a generalization of monotone and Minty VIPs)
Our convergence analysis holds under the arbitrary sampling paradigm, which includes importance sampling and various mini-batching strategies as special cases.
arXiv Detail & Related papers (2023-02-27T18:53:28Z) - SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum
Cocoercive Variational Inequalities [137.6408511310322]
We consider the problem of finite-sum cocoercive variational inequalities.
For strongly monotone problems it is possible to achieve linear convergence to a solution using this method.
arXiv Detail & Related papers (2022-10-12T08:04:48Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Stochastic Extragradient: General Analysis and Improved Rates [23.29653334470774]
The Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems.
We develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner.
Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.
arXiv Detail & Related papers (2021-11-16T16:49:31Z) - Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax
Problems [80.46370778277186]
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks.
We develop a communication-efficient distributed extragrad algorithm, LocalAdaSient, with an adaptive learning rate suitable for solving convex-concave minimax problem in the.
Server model.
We demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
arXiv Detail & Related papers (2021-06-18T09:42:05Z) - A Unified Analysis of Stochastic Gradient Methods for Nonconvex
Federated Optimization [16.714109768541785]
We provide a single analysis for all methods that satisfy the SGD variants in the non non regime.
We also provide a unified analysis for obtaining faster linear convergence in the non non regime under PL condition.
arXiv Detail & Related papers (2020-06-12T08:58:03Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.