Related papers: A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization

A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization

URL: http://arxiv.org/abs/2412.14488v2
Date: Fri, 10 Jan 2025 13:52:14 GMT
Title: A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization
Authors: Chuan He,
Abstract summary: We show that the proposed SFOM can accelerate optimization by exploiting the high-order smoothness of the objective function $f$.<n>To the best of our knowledge, this is the first SFOM to leverage arbitrary-order smoothness of the objective function for acceleration.
Score: 3.8919212824749296
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we consider an unconstrained stochastic optimization problem where the objective function exhibits high-order smoothness. Specifically, we propose a new stochastic first-order method (SFOM) with multi-extrapolated momentum, in which multiple extrapolations are performed in each iteration, followed by a momentum update based on these extrapolations. We demonstrate that the proposed SFOM can accelerate optimization by exploiting the high-order smoothness of the objective function $f$. Assuming that the $p$th-order derivative of $f$ is Lipschitz continuous for some $p\ge2$, and under additional mild assumptions, we establish that our method achieves a sample complexity of $\widetilde{\mathcal{O}}(\epsilon^{-(3p+1)/p})$ for finding a point $x$ such that $\mathbb{E}[\|\nabla f(x)\|]\le\epsilon$. To the best of our knowledge, this is the first SFOM to leverage arbitrary-order smoothness of the objective function for acceleration, resulting in a sample complexity that improves upon the best-known results without assuming the mean-squared smoothness condition. Preliminary numerical experiments validate the practical performance of our method and support our theoretical findings.

Related papers

Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization [64.99236464953032]
We propose a new state-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution.<n>By applying our hinge-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution, we achieve a new state-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution.
arXiv Detail & Related papers (2025-06-03T06:31:59Z)
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z)
Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization [2.0657831823662574]
When the non problem is by which the non problem is by whichity, the sample of first-order methods may depend linearly on the problem dimension, is for undesirable problems. Our algorithms allow for the estimate of complexity using the distance of. mathO (log d) / EuM4. We prove that DISFOM can sharpen variance employing $mathO (log d) / EuM4.
arXiv Detail & Related papers (2024-06-27T18:38:42Z)
A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness [4.097563258332958]
We propose a fast quasi-Newton method when there exists non-uniformity in smoothness. Our algorithm can achieve the best-known $mathcalO(epsilon-3)$ sample complexity and enjoys convergence speedup. Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-22T14:40:29Z)
Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation [22.758674468435302]
In a heavy-tailed noise regime, the difference between the gradient and the true rate is assumed to have a finite $p-th moment. This paper provides a comprehensive analysis of nonsmooth convex optimization with heavy-tailed noises.
arXiv Detail & Related papers (2023-03-22T03:05:28Z)
Variance-reduced Clipping for Non-convex Optimization [24.765794811146144]
Gradient clipping is a technique used in deep learning applications such as large-scale language modeling. Recent experimental training have a fairly special behavior in that it mitigates order complexity.
arXiv Detail & Related papers (2023-03-02T00:57:38Z)
A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available. We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties. We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems. We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample Complexity [25.775517797956237]
This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication. We show that our algorithm outperforms several state-of-the-art algorithms in practice.
arXiv Detail & Related papers (2022-02-01T07:27:34Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Riemannian stochastic recursive momentum method for non-convex optimization [36.79189106909088]
We propose atildeOepsilon$-approximate gradient evaluations evaluation method for $mathcalO gradient evaluations with one iteration. Proposed experiment demonstrate the superiority of the algorithm.
arXiv Detail & Related papers (2020-08-11T07:05:58Z)
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [54.42518331209581]
We find an algorithm which finds. epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using. $(epsilon,gamma)$surimate random random points. Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.