Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise
- URL: http://arxiv.org/abs/2206.01095v1
- Date: Thu, 2 Jun 2022 15:21:55 GMT
- Title: Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise
- Authors: Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechensky,
Alexander Gasnikov, Gauthier Gidel
- Abstract summary: We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
- Score: 64.85879194013407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic first-order methods such as Stochastic Extragradient (SEG) or
Stochastic Gradient Descent-Ascent (SGDA) for solving smooth minimax problems
and, more generally, variational inequality problems (VIP) have been gaining a
lot of attention in recent years due to the growing popularity of adversarial
formulations in machine learning. However, while high-probability convergence
bounds are known to reflect the actual behavior of stochastic methods more
accurately, most convergence results are provided in expectation. Moreover, the
only known high-probability complexity results have been derived under
restrictive sub-Gaussian (light-tailed) noise and bounded domain Assump.
[Juditsky et al., 2011]. In this work, we prove the first high-probability
complexity results with logarithmic dependence on the confidence level for
stochastic methods for solving monotone and structured non-monotone VIPs with
non-sub-Gaussian (heavy-tailed) noise and unbounded domains. In the monotone
case, our results match the best-known ones in the light-tails case [Juditsky
et al., 2011], and are novel for structured non-monotone problems such as
negative comonotone, quasi-strongly monotone, and/or star-cocoercive ones. We
achieve these results by studying SEG and SGDA with clipping. In addition, we
numerically validate that the gradient noise of many practical GAN formulations
is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
Related papers
- From Gradient Clipping to Normalization for Heavy Tailed SGD [19.369399536643773]
Recent empirical evidence indicates that machine learning applications involve heavy-tailed noise, which challenges the standard assumptions of bounded variance in practice.
In this paper, we show that it is possible to achieve tightness of the gradient-dependent noise convergence problem under tailed noise.
arXiv Detail & Related papers (2024-10-17T17:59:01Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees.
Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z) - Stochastic Methods in Variational Inequalities: Ergodicity, Bias and
Refinements [19.524063429548278]
Extragradient (SEG) and Gradient Descent Ascent (SGDA) are preeminent algorithms for min-max optimization and variational inequalities problems.
Our work endeavors to quantify and quantify the intrinsic structures intrinsic to these algorithms.
By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem.
arXiv Detail & Related papers (2023-06-28T18:50:07Z) - SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to
Unknown Parameters, Unbounded Gradients and Affine Variance [33.593203156666746]
We show that AdaGrad stepsizes a popular adaptive (self-tuning) method for first-order optimization.
In both the low-noise and high-regimes we find sharp rates of convergence in both the low-noise and high-regimes.
arXiv Detail & Related papers (2023-02-17T09:46:08Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.