Making SGD Parameter-Free
- URL: http://arxiv.org/abs/2205.02160v3
- Date: Fri, 1 Mar 2024 14:45:21 GMT
- Title: Making SGD Parameter-Free
- Authors: Yair Carmon and Oliver Hinder
- Abstract summary: Our algorithm is conceptually simple, has high-probability guarantees, and is partially adaptive to unknown gradient norms, smoothness, and strong convexity.
At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.
- Score: 28.088227276891885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop an algorithm for parameter-free stochastic convex optimization
(SCO) whose rate of convergence is only a double-logarithmic factor larger than
the optimal rate for the corresponding known-parameter setting. In contrast,
the best previously known rates for parameter-free SCO are based on online
parameter-free regret bounds, which contain unavoidable excess logarithmic
terms compared to their known-parameter counterparts. Our algorithm is
conceptually simple, has high-probability guarantees, and is also partially
adaptive to unknown gradient norms, smoothness, and strong convexity. At the
heart of our results is a novel parameter-free certificate for SGD step size
choice, and a time-uniform concentration result that assumes no a-priori bounds
on SGD iterates.
Related papers
- Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions [18.47705532817026]
We show that AdaGrad outperforms SGD by a factor of $d$ under certain conditions.
Motivated by this, we introduce assumptions on the smoothness structure of the objective and the gradient variance.
arXiv Detail & Related papers (2024-06-07T02:55:57Z) - The Price of Adaptivity in Stochastic Convex Optimization [23.776027867314628]
We prove results for adaptivity in non-smooth convex optimization.
We define a "price of adaptivity" (PoA) that measures the multiplicative increase in suboptimality.
arXiv Detail & Related papers (2024-02-16T18:56:41Z) - How Free is Parameter-Free Stochastic Optimization? [29.174036532175855]
We study the problem of parameter-free optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist.
Existing methods can only be considered partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters.
We demonstrate that a simple hyper search technique results in a fully parameter-free method that outperforms more sophisticated state-the-art algorithms.
arXiv Detail & Related papers (2024-02-05T15:51:49Z) - Adiabatic-Passage-Based Parameter Setting for Quantum Approximate
Optimization Algorithm [0.7252027234425334]
We propose a novel adiabatic-passage-based parameter setting method.
This method remarkably reduces the optimization cost, specifically when applied to the 3-SAT problem, to a sublinear level.
arXiv Detail & Related papers (2023-11-30T01:06:41Z) - Parameter-Agnostic Optimization under Relaxed Smoothness [25.608968462899316]
We show that Normalized Gradient Descent with Momentum (NSGD-M) can achieve a rate-optimal complexity without prior knowledge of any problem parameter.
In deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search.
arXiv Detail & Related papers (2023-11-06T16:39:53Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for
Safety-Critical Applications [71.23286211775084]
We introduce robust Gaussian process uniform error bounds in settings with unknown hyper parameters.
Our approach computes a confidence region in the space of hyper parameters, which enables us to obtain a probabilistic upper bound for the model error.
Experiments show that the bound performs significantly better than vanilla and fully Bayesian processes.
arXiv Detail & Related papers (2021-09-06T17:10:01Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.