Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits
- URL: http://arxiv.org/abs/2510.10690v1
- Date: Sun, 12 Oct 2025 16:36:54 GMT
- Title: Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits
- Authors: Abdurakhmon Sadiev, Peter Richtárik, Ilyas Fatkhullin,
- Abstract summary: We take a first step toward a theoretical understanding of second-order optimization under heavy-tailed noise.<n>We introduce a novel algorithm based on gradient and Hessian clipping, and prove high-probability upper bounds that nearly match the fundamental limits.
- Score: 53.773695219320125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Heavy-tailed noise is pervasive in modern machine learning applications, arising from data heterogeneity, outliers, and non-stationary stochastic environments. While second-order methods can significantly accelerate convergence in light-tailed or bounded-noise settings, such algorithms are often brittle and lack guarantees under heavy-tailed noise -- precisely the regimes where robustness is most critical. In this work, we take a first step toward a theoretical understanding of second-order optimization under heavy-tailed noise. We consider a setting where stochastic gradients and Hessians have only bounded $p$-th moments, for some $p\in (1,2]$, and establish tight lower bounds on the sample complexity of any second-order method. We then develop a variant of normalized stochastic gradient descent that leverages second-order information and provably matches these lower bounds. To address the instability caused by large deviations, we introduce a novel algorithm based on gradient and Hessian clipping, and prove high-probability upper bounds that nearly match the fundamental limits. Our results provide the first comprehensive sample complexity characterization for second-order optimization under heavy-tailed noise. This positions Hessian clipping as a robust and theoretically sound strategy for second-order algorithm design in heavy-tailed regimes.
Related papers
- Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises [27.097405340259325]
Existing decentralized optimization methods assume the lower-level loss function is strongly convex and the gradient noise has finite variance.<n>This is the first decentralized bilevel algorithm with with rigorous theoretical guarantees under heavy-tailed noises.
arXiv Detail & Related papers (2025-09-19T02:51:19Z) - High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise [23.663813244183984]
We propose a Trust-Region Sequential Quadratic Programming (TR-SSQP) method.<n>We establish its high-probability complexity bounds for identifying first- and second-order $epsilon$-stationary points.<n>We show that our method achieves the same high-probability first-order complexity bounds as in the light-tailed noise setting.
arXiv Detail & Related papers (2025-03-24T19:23:13Z) - From Gradient Clipping to Normalization for Heavy Tailed SGD [19.369399536643773]
Recent empirical evidence indicates that machine learning applications involve heavy-tailed noise, which challenges the standard assumptions of bounded variance in practice.<n>In this paper, we show that it is possible to achieve tightness of the gradient-dependent noise convergence problem under tailed noise.
arXiv Detail & Related papers (2024-10-17T17:59:01Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.