Related papers: Accelerated Parameter-Free Stochastic Optimization

Accelerated Parameter-Free Stochastic Optimization

URL: http://arxiv.org/abs/2404.00666v2
Date: Fri, 5 Jul 2024 16:15:53 GMT
Title: Accelerated Parameter-Free Stochastic Optimization
Authors: Itai Kreisler, Maor Ivgi, Oliver Hinder, Yair Carmon,
Abstract summary: We propose a method that achieves near-optimal rates for smooth convex optimization. It requires essentially no prior knowledge of problem parameters. Our experiments show consistent, strong performance on convex problems and mixed results on neural network training.
Score: 28.705054104155973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters. This improves on prior work which requires knowing at least the initial distance to optimality d0. Our method, U-DoG, combines UniXGrad (Kavis et al., 2019) and DoG (Ivgi et al., 2023) with novel iterate stabilization techniques. It requires only loose bounds on d0 and the noise magnitude, provides high probability guarantees under sub-Gaussian noise, and is also near-optimal in the non-smooth case. Our experiments show consistent, strong performance on convex problems and mixed results on neural network training.

Related papers

Benchmarking Lie-Algebraic Pretraining and Non-Variational QWOA for the MaxCut Problem [4.103893081207555]
This paper provides a comparative performance analysis of two strategies designed to improve trainability.<n>We benchmark both methods on the unweighted Maxcut problem using a circuit depth of $p = 256$ across 200 Erds-Rényi and 200 3-regular graphs.<n>Both approaches significantly improve upon the standard randomly QWOA. NV-QWOA attains a mean approximation ratio of 98.9% in just 60 iterations, while the Lie-algebraic pretrained QWOA improves to 77.71% after 500 iterations.
arXiv Detail & Related papers (2025-12-28T09:42:02Z)
Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization [31.032959636901086]
We propose novel adaptive algorithms for hierarchical optimization problems.<n>Our algorithms achieve sharp convergence rates without prior knowledge of the noise level.<n>Experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms.
arXiv Detail & Related papers (2025-09-18T20:17:18Z)
Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization [21.865728815935665]
We show that Random Reshuffle ($textsfRR$) and Single Shuffle ($textsfSS$) strategies are both provably faster than Proximal GD.<n>As an important implication, we give the first (nearly) optimal convergence result for the suffix average under the $textsfRR$ sampling scheme.
arXiv Detail & Related papers (2025-05-29T03:53:45Z)
Optimal Rates for Robust Stochastic Convex Optimization [12.620782629498812]
We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $epsilon$-contamination model. Our algorithms do not require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.
arXiv Detail & Related papers (2024-12-15T00:52:08Z)
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z)
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z)
Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models. New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems. Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.