Stochastic Extragradient: General Analysis and Improved Rates
- URL: http://arxiv.org/abs/2111.08611v1
- Date: Tue, 16 Nov 2021 16:49:31 GMT
- Title: Stochastic Extragradient: General Analysis and Improved Rates
- Authors: Eduard Gorbunov, Hugo Berard, Gauthier Gidel, Nicolas Loizou
- Abstract summary: The Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems.
We develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner.
Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.
- Score: 23.29653334470774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Stochastic Extragradient (SEG) method is one of the most popular
algorithms for solving min-max optimization and variational inequalities
problems (VIP) appearing in various machine learning tasks. However, several
important questions regarding the convergence properties of SEG are still open,
including the sampling of stochastic gradients, mini-batching, convergence
guarantees for the monotone finite-sum variational inequalities with possibly
non-monotone terms, and others. To address these questions, in this paper, we
develop a novel theoretical framework that allows us to analyze several
variants of SEG in a unified manner. Besides standard setups, like Same-Sample
SEG under Lipschitzness and monotonicity or Independent-Samples SEG under
uniformly bounded variance, our approach allows us to analyze variants of SEG
that were never explicitly considered in the literature before. Notably, we
analyze SEG with arbitrary sampling which includes importance sampling and
various mini-batching strategies as special cases. Our rates for the new
variants of SEG outperform the current state-of-the-art convergence guarantees
and rely on less restrictive assumptions.
Related papers
- Stochastic Extragradient with Random Reshuffling: Improved Convergence for Variational Inequalities [40.1331539540764]
We provide a convergence analysis of SEG with Random Reshuffling (SEG-RR) for three classes of VIPs.
We derive conditions under which SEG-RR achieves a faster convergence rate than the uniform with-replacement sampling SEG.
In the monotone setting, our analysis of SEG-RR guarantees convergence to an arbitrary accuracy without large batch sizes.
arXiv Detail & Related papers (2024-03-11T20:35:52Z) - High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees.
Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z) - Stochastic Methods in Variational Inequalities: Ergodicity, Bias and
Refinements [19.524063429548278]
Extragradient (SEG) and Gradient Descent Ascent (SGDA) are preeminent algorithms for min-max optimization and variational inequalities problems.
Our work endeavors to quantify and quantify the intrinsic structures intrinsic to these algorithms.
By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem.
arXiv Detail & Related papers (2023-06-28T18:50:07Z) - Single-Call Stochastic Extragradient Methods for Structured Non-monotone
Variational Inequalities: Improved Analysis under Weaker Conditions [21.681130192954583]
Single-call extragradient methods are one of the most efficient algorithms for solving large-scale min-max optimization problems.
We provide convergence guarantees for two classes of VIPs: (i) quasi-strongly monotone problems (a generalization of strongly monotone problems) and (ii) weak Minty variational inequalities (a generalization of monotone and Minty VIPs)
Our convergence analysis holds under the arbitrary sampling paradigm, which includes importance sampling and various mini-batching strategies as special cases.
arXiv Detail & Related papers (2023-02-27T18:53:28Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP)
In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods.
We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z) - Result Diversification by Multi-objective Evolutionary Algorithms with
Theoretical Guarantees [94.72461292387146]
We propose to reformulate the result diversification problem as a bi-objective search problem, and solve it by a multi-objective evolutionary algorithm (EA)
We theoretically prove that the GSEMO can achieve the optimal-time approximation ratio, $1/2$.
When the objective function changes dynamically, the GSEMO can maintain this approximation ratio in running time, addressing the open question proposed by Borodin et al.
arXiv Detail & Related papers (2021-10-18T14:00:22Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - A Unified Analysis of Stochastic Gradient Methods for Nonconvex
Federated Optimization [16.714109768541785]
We provide a single analysis for all methods that satisfy the SGD variants in the non non regime.
We also provide a unified analysis for obtaining faster linear convergence in the non non regime under PL condition.
arXiv Detail & Related papers (2020-06-12T08:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.