Related papers: Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise

Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise

URL: http://arxiv.org/abs/2505.20817v1
Date: Tue, 27 May 2025 07:23:42 GMT
Title: Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise
Authors: Savelii Chezhegov, Aleksandr Beznosikov, Samuel Horváth, Eduard Gorbunov,
Abstract summary: First-order methods with clipping, such as Clip-SGD, exhibit stronger convergence guarantees than SGD under the $(L_$1)$-smoothness assumption.<n>We establish the first high-probability convergence bounds for Clip-SGD applied to convex $(L_$1)$-smooth optimization with heavytailed noise.
Score: 60.17850744118546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient clipping is a widely used technique in Machine Learning and Deep Learning (DL), known for its effectiveness in mitigating the impact of heavy-tailed noise, which frequently arises in the training of large language models. Additionally, first-order methods with clipping, such as Clip-SGD, exhibit stronger convergence guarantees than SGD under the $(L_0,L_1)$-smoothness assumption, a property observed in many DL tasks. However, the high-probability convergence of Clip-SGD under both assumptions -- heavy-tailed noise and $(L_0,L_1)$-smoothness -- has not been fully addressed in the literature. In this paper, we bridge this critical gap by establishing the first high-probability convergence bounds for Clip-SGD applied to convex $(L_0,L_1)$-smooth optimization with heavy-tailed noise. Our analysis extends prior results by recovering known bounds for the deterministic case and the stochastic setting with $L_1 = 0$ as special cases. Notably, our rates avoid exponentially large factors and do not rely on restrictive sub-Gaussian noise assumptions, significantly broadening the applicability of gradient clipping.

Related papers

Can SGD Handle Heavy-Tailed Noise? [6.111519084375339]
Gradient Descent (SGD) is a machine learning project of large-scale optimization, yet its theoretical behavior under heavy-tailed noise is poorly understood.<n>We rigorously investigate whether SGD, can provably succeed under such adverse conditions.
arXiv Detail & Related papers (2025-08-06T20:09:41Z)
Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level [12.47309834217498]
We provide the first high-probability convergence analysis for DP with a fixed clipping level.<n>Our results show that, with a fixed clipping level, the method converges to a neighborhood optimal solution with a faster rate than the existing ones.<n>The neighborhood is balanced against the noise introduced by DP, providing a refined trade-off between convergence speed and privacy guarantees.
arXiv Detail & Related papers (2025-07-31T12:48:29Z)
Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises [55.43924214633558]
In this paper, we focus on two types of noises: one is sub-Weibull noises, and the other is SsBC noises.<n>Under these two noise assumptions, the in-expectation and high-probability convergence of SFOMs have been studied in the contexts of convex optimization and smooth optimization.
arXiv Detail & Related papers (2025-07-17T16:48:45Z)
Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise. Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise. We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z)
From Gradient Clipping to Normalization for Heavy Tailed SGD [19.369399536643773]
Recent empirical evidence indicates that machine learning applications involve heavy-tailed noise, which challenges the standard assumptions of bounded variance in practice.<n>In this paper, we show that it is possible to achieve tightness of the gradient-dependent noise convergence problem under tailed noise.
arXiv Detail & Related papers (2024-10-17T17:59:01Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z)
Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping [40.69950711262191]
We consider differentially private convex optimization for heavy-tailed data with the guarantee of being differentially private (DP) We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems.
arXiv Detail & Related papers (2022-06-27T01:39:15Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.