Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity
- URL: http://arxiv.org/abs/2107.00052v1
- Date: Wed, 30 Jun 2021 18:32:46 GMT
- Title: Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity
- Authors: Nicolas Loizou, Hugo Berard, Gauthier Gidel, Ioannis Mitliagkas, Simon
Lacoste-Julien
- Abstract summary: We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
- Score: 49.66890309455787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two of the most prominent algorithms for solving unconstrained smooth games
are the classical stochastic gradient descent-ascent (SGDA) and the recently
introduced stochastic consensus optimization (SCO) (Mescheder et al., 2017).
SGDA is known to converge to a stationary point for specific classes of games,
but current convergence analyses require a bounded variance assumption. SCO is
used successfully for solving large-scale adversarial problems, but its
convergence guarantees are limited to its deterministic variant. In this work,
we introduce the expected co-coercivity condition, explain its benefits, and
provide the first last-iterate convergence guarantees of SGDA and SCO under
this condition for solving a class of stochastic variational inequality
problems that are potentially non-monotone. We prove linear convergence of both
methods to a neighborhood of the solution when they use constant step-size, and
we propose insightful stepsize-switching rules to guarantee convergence to the
exact solution. In addition, our convergence guarantees hold under the
arbitrary sampling paradigm, and as such, we give insights into the complexity
of minibatching.
Related papers
- An Inexact Halpern Iteration with Application to Distributionally Robust
Optimization [9.529117276663431]
We investigate the inexact variants of the scheme in both deterministic and deterministic convergence settings.
We show that by choosing the inexactness appropriately, the inexact schemes admit an $O(k-1) convergence rate in terms of the (expected) residue norm.
arXiv Detail & Related papers (2024-02-08T20:12:47Z) - Single-Call Stochastic Extragradient Methods for Structured Non-monotone
Variational Inequalities: Improved Analysis under Weaker Conditions [21.681130192954583]
Single-call extragradient methods are one of the most efficient algorithms for solving large-scale min-max optimization problems.
We provide convergence guarantees for two classes of VIPs: (i) quasi-strongly monotone problems (a generalization of strongly monotone problems) and (ii) weak Minty variational inequalities (a generalization of monotone and Minty VIPs)
Our convergence analysis holds under the arbitrary sampling paradigm, which includes importance sampling and various mini-batching strategies as special cases.
arXiv Detail & Related papers (2023-02-27T18:53:28Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Distributed Stochastic Optimization under a General Variance Condition [13.911633636387059]
Distributed optimization has drawn great attention recently due to its effectiveness in solving largescale machine learning problems.
We revisit the classical Federated Averaging (Avg) and establish the convergence results under only a mild variance for smooth non objective functions.
Almost a stationary convergence point is also established under the gradients condition.
arXiv Detail & Related papers (2023-01-30T05:48:09Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - On the Convergence of mSGD and AdaGrad for Stochastic Optimization [0.696125353550498]
convex descent (SGD) has been intensively developed and extensively applied in machine learning in the past decade.
Some modified SGD-type algorithms, which outperform the SGD in many competitions and applications in terms of convergence rate and accuracy, such as momentum-based SGD (mSGD) and adaptive gradient optimization (AdaGrad)
We focus on convergence analysis of mSGD and AdaGrad for any smooth (possibly non-possibly non-possibly non-possibly) loss functions in machine learning.
arXiv Detail & Related papers (2022-01-26T22:02:21Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.