Accelerated Parameter-Free Stochastic Optimization
- URL: http://arxiv.org/abs/2404.00666v2
- Date: Fri, 5 Jul 2024 16:15:53 GMT
- Title: Accelerated Parameter-Free Stochastic Optimization
- Authors: Itai Kreisler, Maor Ivgi, Oliver Hinder, Yair Carmon,
- Abstract summary: We propose a method that achieves near-optimal rates for smooth convex optimization.
It requires essentially no prior knowledge of problem parameters.
Our experiments show consistent, strong performance on convex problems and mixed results on neural network training.
- Score: 28.705054104155973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters. This improves on prior work which requires knowing at least the initial distance to optimality d0. Our method, U-DoG, combines UniXGrad (Kavis et al., 2019) and DoG (Ivgi et al., 2023) with novel iterate stabilization techniques. It requires only loose bounds on d0 and the noise magnitude, provides high probability guarantees under sub-Gaussian noise, and is also near-optimal in the non-smooth case. Our experiments show consistent, strong performance on convex problems and mixed results on neural network training.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Improving Quantum Approximate Optimization by Noise-Directed Adaptive Remapping [3.47862118034022]
We present emphNoise-Directed Adaptive Remapping (NDAR), a meta-algorithm for approximately solving binary optimization problems by leveraging certain types of noise.
We demonstrate the effectiveness of our protocol in experiments with subsystems of the newest generation of Rigetti Computing's superconducting device Ankaa-2.
arXiv Detail & Related papers (2024-04-01T18:28:57Z) - Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models.
New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.