Related papers: High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

URL: http://arxiv.org/abs/2208.08567v2
Date: Sun, 14 Apr 2024 21:29:23 GMT
Title: High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise
Authors: Daniela A. Parletta, Andrea Paudice, Massimiliano Pontil, Saverio Salzo,
Abstract summary: We study high probability bounds for subgradient methods under heavy tailed noise. We show that this clipping strategy leads both to near optimal any-time and finite horizon bounds for many classical averaging schemes.
Score: 32.27255488868277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this setting the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to near optimal any-time and finite horizon bounds for many classical averaging schemes. Preliminary experiments are shown to support the validity of the method.

Related papers

A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Variance Reduction and Low Sample Complexity in Stochastic Optimization via Proximal Point Method [5.025654873456757]
The paper establishes a low sample complexity to obtain a high probability guarantee on the convergence of the proposed method. A subroutine is developed to solve the proximal subproblem, which also serves as a novel technique for variance reduction.
arXiv Detail & Related papers (2024-02-14T07:34:22Z)
High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees. Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z)
Almost-sure convergence of iterates and multipliers in stochastic sequential quadratic optimization [21.022322975077653]
Methods for solving continuous optimization problems with equality constraints have attracted attention recently. convergence guarantees have been limited to the expected value of a gradient to measure zero. New almost-sure convergence guarantees for the primals, Lagrange measures and station measures generated by a SQP algorithm are proved.
arXiv Detail & Related papers (2023-08-07T16:03:40Z)
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance [33.593203156666746]
We show that AdaGrad stepsizes a popular adaptive (self-tuning) method for first-order optimization. In both the low-noise and high-regimes we find sharp rates of convergence in both the low-noise and high-regimes.
arXiv Detail & Related papers (2023-02-17T09:46:08Z)
A note on diffusion limits for stochastic gradient descent [0.0]
Much of the theory, that attempts to clarify the role of noise in gradient algorithms, has widely approximated gradient descent by a differential equation with Gaussian noise. We provide a novel theoretical justification for this practice that showcases how the noise arises naturally.
arXiv Detail & Related papers (2022-10-20T13:27:00Z)
Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs. Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems. In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.