Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
- URL: http://arxiv.org/abs/2506.11214v1
- Date: Thu, 12 Jun 2025 18:21:15 GMT
- Title: Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
- Authors: Chuan He, Zhaosong Lu, Defeng Sun, Zhanwang Deng,
- Abstract summary: We propose practical normalized first-order methods for unconstrained optimization problems.<n>Our complexity bounds either improve upon or match the best-known results in the literature.
- Score: 6.733083995005145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose practical normalized stochastic first-order methods with Polyak momentum, multi-extrapolated momentum, and recursive momentum for solving unconstrained optimization problems. These methods employ dynamically updated algorithmic parameters and do not require explicit knowledge of problem-dependent quantities such as the Lipschitz constant or noise bound. We establish first-order oracle complexity results for finding approximate stochastic stationary points under heavy-tailed noise and weakly average smoothness conditions -- both of which are weaker than the commonly used bounded variance and mean-squared smoothness assumptions. Our complexity bounds either improve upon or match the best-known results in the literature. Numerical experiments are presented to demonstrate the practical effectiveness of the proposed methods.
Related papers
- Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems [0.5156484100374059]
We explore the relationship between recent line search approaches for deep optimization in the overparametrized regime and momentum directions.
We introduce algorithmic that exploits a mix of data persistency, conjugateient type rules for the definition of the momentum parameter.
The resulting algorithm is empirically shown to outperform other popular methods.
arXiv Detail & Related papers (2024-11-11T16:26:33Z) - Accelerated stochastic approximation with state-dependent noise [7.4648480208501455]
We consider a class of smooth convex optimization problems under general assumptions on the quadratic noise in the gradient observation.
Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics.
We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate.
arXiv Detail & Related papers (2023-07-04T06:06:10Z) - First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods.
Our approach covers both non-linear gradient and strongly Monte Carlo problems.
We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z) - Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms [8.669461942767098]
We study momentum-based first-order optimization algorithms in which the iterations are subject to an additive white noise.
For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification.
We introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time.
arXiv Detail & Related papers (2022-09-24T04:26:30Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth
Non-Convex Optimization [25.680334940504405]
This paper establishes the convergence of the rate of a non-smooth subient method with momentum for constrained problems.
For problems, we show how the unconstrained case can be analyzed under weaker assumptions than the state-of-the-art.
arXiv Detail & Related papers (2020-02-13T12:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.