REX: Revisiting Budgeted Training with an Improved Schedule
- URL: http://arxiv.org/abs/2107.04197v1
- Date: Fri, 9 Jul 2021 04:17:35 GMT
- Title: REX: Revisiting Budgeted Training with an Improved Schedule
- Authors: John Chen, Cameron Wolfe, Anastasios Kyrillidis
- Abstract summary: We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule.
REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules.
- Score: 14.618325490983052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning practitioners often operate on a computational and monetary
budget. Thus, it is critical to design optimization algorithms that perform
well under any budget. The linear learning rate schedule is considered the best
budget-aware schedule, as it outperforms most other schedules in the low budget
regime. On the other hand, learning rate schedules -- such as the
\texttt{30-60-90} step schedule -- are known to achieve high performance when
the model can be trained for many epochs. Yet, it is often not known a priori
whether one's budget will be large or small; thus, the optimal choice of
learning rate schedule is made on a case-by-case basis. In this paper, we frame
the learning rate schedule selection problem as a combination of $i)$ selecting
a profile (i.e., the continuous function that models the learning rate
schedule), and $ii)$ choosing a sampling rate (i.e., how frequently the
learning rate is updated/sampled from this profile). We propose a novel profile
and sampling rate combination called the Reflected Exponential (REX) schedule,
which we evaluate across seven different experimental settings with both SGD
and Adam optimizers. REX outperforms the linear schedule in the low budget
regime, while matching or exceeding the performance of several state-of-the-art
learning rate schedules (linear, step, exponential, cosine, step decay on
plateau, and OneCycle) in both high and low budget regimes. Furthermore, REX
requires no added computation, storage, or hyperparameters.
Related papers
- Optimal Linear Decay Learning Rate Schedules and Further Refinements [46.79573408189601]
Learning rate schedules used in practice bear little resemblance to those recommended by theory.
We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules.
arXiv Detail & Related papers (2023-10-11T19:16:35Z) - Best Arm Identification for Stochastic Rising Bandits [84.55453174601826]
Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected.
This paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs.
We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE and R-SR.
arXiv Detail & Related papers (2023-02-15T08:01:37Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule
towards Flatter Local Minima [40.70374106466073]
We propose a generic learning rate schedule plugin called LEArning Rate Perturbation (LEAP)
LEAP can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate.
We conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets.
arXiv Detail & Related papers (2022-08-25T05:05:18Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment.
We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z) - Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic
Objectives with Skewed Hessian Spectrums [26.44093918424658]
Eigencurve is the first family of learning rate schedules that can achieve minimax optimal convergence rates (up to a constant) for SGD on quadratic objectives.
Experimental results show that Eigencurve can significantly outperform step decay in image classification tasks.
Two simple learning rate schedulers for practical applications can approximate Eigencurve.
arXiv Detail & Related papers (2021-10-27T01:17:53Z) - Online Stochastic Optimization with Wasserstein Based Non-stationarity [12.91020811577007]
We consider a general online optimization problem with multiple budget constraints over a horizon of finite time periods.
The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints.
This formulation captures a wide range of applications including online linear programming and network revenue management.
arXiv Detail & Related papers (2020-12-13T04:47:37Z) - Automatic Tuning of Stochastic Gradient Descent with Bayesian
Optimisation [8.340191147575307]
We introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation.
It flexibly adjusts to abrupt changes of behaviours induced by new learning rate values.
It is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks, as well as warm-starting it for a new task.
arXiv Detail & Related papers (2020-06-25T13:18:18Z) - The Two Regimes of Deep Network Training [93.84309968956941]
We study the effects of different learning schedules and the appropriate way to select them.
To this end, we isolate two distinct phases, which we refer to as the "large-step regime" and the "small-step regime"
Our training algorithm can significantly simplify learning rate schedules.
arXiv Detail & Related papers (2020-02-24T17:08:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.