Related papers: Bias-Variance Trade-off for Clipped Stochastic First-Order Methods: From Bounded Variance to Infinite Mean

Bias-Variance Trade-off for Clipped Stochastic First-Order Methods: From Bounded Variance to Infinite Mean

URL: http://arxiv.org/abs/2512.14686v1
Date: Tue, 16 Dec 2025 18:52:15 GMT
Title: Bias-Variance Trade-off for Clipped Stochastic First-Order Methods: From Bounded Variance to Infinite Mean
Authors: Chuan He,
Abstract summary: oracle first-order methods (SFOMs) are a key technique for controlling heavy-tailed gradients.<n>This paper tackles the general case of noise with tail index $in(0,2]$), covering regimes ranging from noise with bounded variance to noise with an infinite mean.<n>We show that when a symmetry measure of the noise tail is controlled, clipped SFOMs achieve improved complexity guarantees in the presence of heavy-tailed noise.
Score: 1.0064470833375987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic optimization is fundamental to modern machine learning. Recent research has extended the study of stochastic first-order methods (SFOMs) from light-tailed to heavy-tailed noise, which frequently arises in practice, with clipping emerging as a key technique for controlling heavy-tailed gradients. Extensive theoretical advances have further shown that the oracle complexity of SFOMs depends on the tail index $α$ of the noise. Nonetheless, existing complexity results often cover only the case $α\in (1,2]$, that is, the regime where the noise has a finite mean, while the complexity bounds tend to infinity as $α$ approaches $1$. This paper tackles the general case of noise with tail index $α\in(0,2]$, covering regimes ranging from noise with bounded variance to noise with an infinite mean, where the latter case has been scarcely studied. Through a novel analysis of the bias-variance trade-off in gradient clipping, we show that when a symmetry measure of the noise tail is controlled, clipped SFOMs achieve improved complexity guarantees in the presence of heavy-tailed noise for any tail index $α\in (0,2]$. Our analysis of the bias-variance trade-off not only yields new unified complexity guarantees for clipped SFOMs across this full range of tail indices, but is also straightforward to apply and can be combined with classical analyses under light-tailed noise to establish oracle complexity guarantees under heavy-tailed noise. Finally, numerical experiments validate our theoretical findings.

Related papers

Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits [53.773695219320125]
We take a first step toward a theoretical understanding of second-order optimization under heavy-tailed noise.<n>We introduce a novel algorithm based on gradient and Hessian clipping, and prove high-probability upper bounds that nearly match the fundamental limits.
arXiv Detail & Related papers (2025-10-12T16:36:54Z)
Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises [55.43924214633558]
In this paper, we focus on two types of noises: one is sub-Weibull noises, and the other is SsBC noises.<n>Under these two noise assumptions, the in-expectation and high-probability convergence of SFOMs have been studied in the contexts of convex optimization and smooth optimization.
arXiv Detail & Related papers (2025-07-17T16:48:45Z)
Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise [65.40001744848615]
First-order methods with clipping, such as Clip-SGD, exhibit stronger convergence guarantees than SGD under the $(L_$1)$-smoothness assumption.<n>We establish the first high-probability convergence bounds for Clip-SGD applied to convex $(L_$1)$-smooth optimization with heavytailed noise.
arXiv Detail & Related papers (2025-05-27T07:23:42Z)
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees [56.80920351680438]
We study high-probability convergence in online learning, in the presence of heavy-tailed noise.<n>We provide guarantees for a broad class of nonlinearities, without any assumptions on noise moments.
arXiv Detail & Related papers (2024-10-17T18:25:28Z)
From Gradient Clipping to Normalization for Heavy Tailed SGD [19.369399536643773]
Recent empirical evidence indicates that machine learning applications involve heavy-tailed noise, which challenges the standard assumptions of bounded variance in practice.<n>In this paper, we show that it is possible to achieve tightness of the gradient-dependent noise convergence problem under tailed noise.
arXiv Detail & Related papers (2024-10-17T17:59:01Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs. Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems. In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z)
Sample-Optimal PAC Learning of Halfspaces with Malicious Noise [4.8728183994912415]
We study efficient PAC learning of halfspaces in $mathRd$ in the presence of malicious noise of Valiant(1985) We present a new analysis for the algorithm of Awasthi et al.( 2017) and show that it essentially achieves the near-optimal sample complexity bound of $tildeO(d)$. We extend the algorithm and analysis to the more general and stronger nasty noise model of Bbbshoutyetal (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in time.
arXiv Detail & Related papers (2021-02-11T20:18:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.