Making SGD Parameter-Free
- URL: http://arxiv.org/abs/2205.02160v3
- Date: Fri, 1 Mar 2024 14:45:21 GMT
- Title: Making SGD Parameter-Free
- Authors: Yair Carmon and Oliver Hinder
- Abstract summary: Our algorithm is conceptually simple, has high-probability guarantees, and is partially adaptive to unknown gradient norms, smoothness, and strong convexity.
At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.
- Score: 28.088227276891885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop an algorithm for parameter-free stochastic convex optimization
(SCO) whose rate of convergence is only a double-logarithmic factor larger than
the optimal rate for the corresponding known-parameter setting. In contrast,
the best previously known rates for parameter-free SCO are based on online
parameter-free regret bounds, which contain unavoidable excess logarithmic
terms compared to their known-parameter counterparts. Our algorithm is
conceptually simple, has high-probability guarantees, and is also partially
adaptive to unknown gradient norms, smoothness, and strong convexity. At the
heart of our results is a novel parameter-free certificate for SGD step size
choice, and a time-uniform concentration result that assumes no a-priori bounds
on SGD iterates.
Related papers
- The Price of Adaptivity in Stochastic Convex Optimization [23.776027867314628]
We prove results for adaptivity in non-smooth convex optimization.
We define a "price of adaptivity" (PoA) that measures the multiplicative increase in suboptimality.
arXiv Detail & Related papers (2024-02-16T18:56:41Z) - How Free is Parameter-Free Stochastic Optimization? [29.174036532175855]
We study the problem of parameter-free optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist.
Existing methods can only be considered partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters.
We demonstrate that a simple hyper search technique results in a fully parameter-free method that outperforms more sophisticated state-the-art algorithms.
arXiv Detail & Related papers (2024-02-05T15:51:49Z) - Adiabatic-Passage-Based Parameter Setting for Quantum Approximate
Optimization Algorithm [0.7252027234425334]
We propose a novel adiabatic-passage-based parameter setting method.
This method remarkably reduces the optimization cost, specifically when applied to the 3-SAT problem, to a sublinear level.
arXiv Detail & Related papers (2023-11-30T01:06:41Z) - Parameter-Agnostic Optimization under Relaxed Smoothness [25.608968462899316]
We show that Normalized Gradient Descent with Momentum (NSGD-M) can achieve a rate-optimal complexity without prior knowledge of any problem parameter.
In deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search.
arXiv Detail & Related papers (2023-11-06T16:39:53Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for
Safety-Critical Applications [71.23286211775084]
We introduce robust Gaussian process uniform error bounds in settings with unknown hyper parameters.
Our approach computes a confidence region in the space of hyper parameters, which enables us to obtain a probabilistic upper bound for the model error.
Experiments show that the bound performs significantly better than vanilla and fully Bayesian processes.
arXiv Detail & Related papers (2021-09-06T17:10:01Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Online Hyperparameter Search Interleaved with Proximal Parameter Updates [9.543667840503739]
We develop a method that relies on the structure of proximal gradient methods and does not require a smooth cost function.
Such a method is applied to Leave-one-out (LOO)-validated Lasso and Group Lasso.
Numerical experiments corroborate the convergence of the proposed method to a local optimum of the LOO validation error curve.
arXiv Detail & Related papers (2020-04-06T15:54:03Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.