High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance
- URL: http://arxiv.org/abs/2302.00999v2
- Date: Tue, 18 Jul 2023 14:19:07 GMT
- Title: High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance
- Authors: Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel
Horv\'ath, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter
Richt\'arik
- Abstract summary: We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
- Score: 59.211456992422136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: During recent years the interest of optimization and machine learning
communities in high-probability convergence of stochastic optimization methods
has been growing. One of the main reasons for this is that high-probability
complexity bounds are more accurate and less studied than in-expectation ones.
However, SOTA high-probability non-asymptotic convergence results are derived
under strong assumptions such as the boundedness of the gradient noise variance
or of the objective's gradient itself. In this paper, we propose several
algorithms with high-probability convergence results under less restrictive
assumptions. In particular, we derive new high-probability convergence results
under the assumption that the gradient/operator noise has bounded central
$\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth
non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly
convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone /
quasi-strongly monotone variational inequalities. These results justify the
usage of the considered methods for solving problems that do not fit standard
functional classes studied in stochastic optimization.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees.
Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to
Unknown Parameters, Unbounded Gradients and Affine Variance [33.593203156666746]
We show that AdaGrad stepsizes a popular adaptive (self-tuning) method for first-order optimization.
In both the low-noise and high-regimes we find sharp rates of convergence in both the low-noise and high-regimes.
arXiv Detail & Related papers (2023-02-17T09:46:08Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion
and Strong Solutions to Variational Inequalities [14.848525762485872]
We leverage the connections between nonexpansive maps, monotone Lipschitz operators, and proximal mappings to obtain near-optimal solutions to monotone inclusion problems.
These results translate into near-optimal guarantees for approximating strong solutions to variational inequality problems, approximating convex-concave min-max optimization problems, and minimizing the norm of the gradient in min-max optimization problems.
arXiv Detail & Related papers (2020-02-20T17:12:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.