Stochastic Methods in Variational Inequalities: Ergodicity, Bias and
Refinements
- URL: http://arxiv.org/abs/2306.16502v1
- Date: Wed, 28 Jun 2023 18:50:07 GMT
- Title: Stochastic Methods in Variational Inequalities: Ergodicity, Bias and
Refinements
- Authors: Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong
Chen, Qiaomin Xie
- Abstract summary: Extragradient (SEG) and Gradient Descent Ascent (SGDA) are preeminent algorithms for min-max optimization and variational inequalities problems.
Our work endeavors to quantify and quantify the intrinsic structures intrinsic to these algorithms.
By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem.
- Score: 19.524063429548278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For min-max optimization and variational inequalities problems (VIP)
encountered in diverse machine learning tasks, Stochastic Extragradient (SEG)
and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent
algorithms. Constant step-size variants of SEG/SGDA have gained popularity,
with appealing benefits such as easy tuning and rapid forgiveness of initial
conditions, but their convergence behaviors are more complicated even in
rudimentary bilinear models. Our work endeavors to elucidate and quantify the
probabilistic structures intrinsic to these algorithms. By recasting the
constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a
first-of-its-kind Law of Large Numbers and a Central Limit Theorem,
demonstrating that the average iterate is asymptotically normal with a unique
invariant distribution for an extensive range of monotone and non-monotone
VIPs. Specializing to convex-concave min-max optimization, we characterize the
relationship between the step-size and the induced bias with respect to the
Von-Neumann's value. Finally, we establish that Richardson-Romberg
extrapolation can improve proximity of the average iterate to the global
solution for VIPs. Our probabilistic analysis, underpinned by experiments
corroborating our theoretical discoveries, harnesses techniques from
optimization, Markov chains, and operator theory.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Stochastic Extragradient with Random Reshuffling: Improved Convergence for Variational Inequalities [40.1331539540764]
We provide a convergence analysis of SEG with Random Reshuffling (SEG-RR) for three classes of VIPs.
We derive conditions under which SEG-RR achieves a faster convergence rate than the uniform with-replacement sampling SEG.
In the monotone setting, our analysis of SEG-RR guarantees convergence to an arbitrary accuracy without large batch sizes.
arXiv Detail & Related papers (2024-03-11T20:35:52Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP)
In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods.
We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z) - Sharp global convergence guarantees for iterative nonconvex
optimization: A Gaussian process perspective [30.524043513721168]
We develop a general recipe for analyzing the convergence of iterative algorithms for a class of regression models.
deterministicly, we accurately capture both the convergence rate of the algorithm and the eventual error floor in the finite-sample regime.
We show sharp convergence rates for both higher-order algorithms based on alternating updates and first-order algorithms based on subgradient subgradients.
arXiv Detail & Related papers (2021-09-20T21:48:19Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Convergence Analysis of Homotopy-SGD for non-convex optimization [43.71213126039448]
We present a first-order algorithm based on a combination of homotopy methods and SGD, called Gradienty-Stoch Descent (H-SGD)
Under some assumptions, we conduct a theoretical analysis of the proposed problem.
Experimental results show that H-SGD can outperform SGD.
arXiv Detail & Related papers (2020-11-20T09:50:40Z) - Convergence rates and approximation results for SGD and its
continuous-time counterpart [16.70533901524849]
This paper proposes a thorough theoretical analysis of convex Gradient Descent (SGD) with non-increasing step sizes.
First, we show that the SGD can be provably approximated by solutions of inhomogeneous Differential Equation (SDE) using coupling.
Recent analyses of deterministic and optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and non-asymptotic bounds.
arXiv Detail & Related papers (2020-04-08T18:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.