Better Parameter-free Stochastic Optimization with ODE Updates for
Coin-Betting
- URL: http://arxiv.org/abs/2006.07507v3
- Date: Tue, 3 May 2022 20:03:15 GMT
- Title: Better Parameter-free Stochastic Optimization with ODE Updates for
Coin-Betting
- Authors: Keyi Chen, John Langford, Francesco Orabona
- Abstract summary: PFSGD algorithms do not require setting learning rates while achieving optimal theoretical performance.
In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models.
We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.
- Score: 31.60239268539764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-free stochastic gradient descent (PFSGD) algorithms do not require
setting learning rates while achieving optimal theoretical performance. In
practical applications, however, there remains an empirical gap between tuned
stochastic gradient descent (SGD) and PFSGD. In this paper, we close the
empirical gap with a new parameter-free algorithm based on continuous-time
Coin-Betting on truncated models. The new update is derived through the
solution of an Ordinary Differential Equation (ODE) and solved in a closed
form. We show empirically that this new parameter-free algorithm outperforms
algorithms with the "best default" learning rates and almost matches the
performance of finely tuned baselines without anything to tune.
Related papers
- Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds [1.6385815610837167]
In this work, we introduce innovative learning-rate-free algorithms for optimization over Riemannian.
We establish high probability convergence guarantees that are optimal, up to logarithmic factors, compared to the best-known optimally tuned rate in the deterministic setting.
Our approach is validated through numerical experiments, demonstrating competitive performance against learning-rate-dependent algorithms.
arXiv Detail & Related papers (2024-06-04T13:17:24Z) - A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness [4.097563258332958]
We propose a fast quasi-Newton method when there exists non-uniformity in smoothness.
Our algorithm can achieve the best-known $mathcalO(epsilon-3)$ sample complexity and enjoys convergence speedup.
Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-22T14:40:29Z) - Online Learning Under A Separable Stochastic Approximation Framework [20.26530917721778]
We propose an online learning algorithm for a class of machine learning models under a separable approximation framework.
We show that the proposed algorithm produces more robust and test performance when compared to other popular learning algorithms.
arXiv Detail & Related papers (2023-05-12T13:53:03Z) - Regret-Aware Black-Box Optimization with Natural Gradients,
Trust-Regions and Entropy Control [17.430247457941284]
Most successful black-boxs, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution.
While these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret.
In contrast, the Relative Entropy Search (MORE) algorithm directly optimize the expected fitness function without the use of rankings.
arXiv Detail & Related papers (2022-05-24T16:25:15Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.