A new regret analysis for Adam-type algorithms
- URL: http://arxiv.org/abs/2003.09729v1
- Date: Sat, 21 Mar 2020 19:19:51 GMT
- Title: A new regret analysis for Adam-type algorithms
- Authors: Ahmet Alacaoglu, Yura Malitsky, Panayotis Mertikopoulos, Volkan Cevher
- Abstract summary: In theory, regret guarantees for online convex optimization require a rapidly decaying $beta_1to0$ schedule.
We propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $beta_1$.
- Score: 78.825194932103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on a theory-practice gap for Adam and its variants
(AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant
first-order moment parameter $\beta_{1}$ (typically between $0.9$ and $0.99$).
In theory, regret guarantees for online convex optimization require a rapidly
decaying $\beta_{1}\to0$ schedule. We show that this is an artifact of the
standard analysis and propose a novel framework that allows us to derive
optimal, data-dependent regret bounds with a constant $\beta_{1}$, without
further assumptions. We also demonstrate the flexibility of our analysis on a
wide range of different algorithms and settings.
Related papers
- ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate [21.378608502899077]
We propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of $mathcalO without depending on the bounded noise assumption.
Our ADOPT achieves superior results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, natural language processing, and deep reinforcement learning.
arXiv Detail & Related papers (2024-11-05T06:57:47Z) - Adam-like Algorithm with Smooth Clipping Attains Global Minima: Analysis
Based on Ergodicity of Functional SDEs [0.0]
We show that an Adam-type algorithm with clipping the globalized non--1 loss function minimizes the regularized non--1 error form.
We also apply the ergodic theory of smooth groups to investigate approaches to learn inverse temperature and time.
arXiv Detail & Related papers (2023-11-29T14:38:59Z) - Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments.
We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order.
We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - A Near-Optimal Algorithm for Univariate Zeroth-Order Budget Convex
Optimization [4.608510640547952]
We prove near-optimal optimization error guarantees for Dy Search.
We show that the classical dependence on the global Lipschitz constant in the error bounds is an artifact of the granularity of the budget.
arXiv Detail & Related papers (2022-08-13T19:57:04Z) - An Online Riemannian PCA for Stochastic Canonical Correlation Analysis [37.8212762083567]
We present an efficient algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices.
While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising.
arXiv Detail & Related papers (2021-06-08T23:38:29Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.