Related papers: REX: Revisiting Budgeted Training with an Improved Schedule

REX: Revisiting Budgeted Training with an Improved Schedule

URL: http://arxiv.org/abs/2107.04197v1
Date: Fri, 9 Jul 2021 04:17:35 GMT
Title: REX: Revisiting Budgeted Training with an Improved Schedule
Authors: John Chen, Cameron Wolfe, Anastasios Kyrillidis
Abstract summary: We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules.
Score: 14.618325490983052
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the \texttt{30-60-90} step schedule -- are known to achieve high performance when the model can be trained for many epochs. Yet, it is often not known a priori whether one's budget will be large or small; thus, the optimal choice of learning rate schedule is made on a case-by-case basis. In this paper, we frame the learning rate schedule selection problem as a combination of $i)$ selecting a profile (i.e., the continuous function that models the learning rate schedule), and $ii)$ choosing a sampling rate (i.e., how frequently the learning rate is updated/sampled from this profile). We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule, which we evaluate across seven different experimental settings with both SGD and Adam optimizers. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules (linear, step, exponential, cosine, step decay on plateau, and OneCycle) in both high and low budget regimes. Furthermore, REX requires no added computation, storage, or hyperparameters.

Related papers

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need! [56.80767500991973]
We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
arXiv Detail & Related papers (2025-06-16T08:42:31Z)
Stepsize anything: A unified learning rate schedule for budgeted-iteration training [43.52874155421866]
Budgeted-iteration training aims to achieve optimal learning within predetermined budgets.<n>While learning rate schedules govern the performance of different networks and tasks, their design remains largely lacking theoretical foundations.<n>We propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules.
arXiv Detail & Related papers (2025-05-30T10:38:03Z)
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms. We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths. We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z)
Optimal Linear Decay Learning Rate Schedules and Further Refinements [46.79573408189601]
Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules.
arXiv Detail & Related papers (2023-10-11T19:16:35Z)
Best Arm Identification for Stochastic Rising Bandits [84.55453174601826]
Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected. This paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE and R-SR.
arXiv Detail & Related papers (2023-02-15T08:01:37Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z)
Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima [40.70374106466073]
We propose a generic learning rate schedule plugin called LEArning Rate Perturbation (LEAP) LEAP can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate. We conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets.
arXiv Detail & Related papers (2022-08-25T05:05:18Z)
Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning. Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system. The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z)
An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment. We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z)
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums [26.44093918424658]
Eigencurve is the first family of learning rate schedules that can achieve minimax optimal convergence rates (up to a constant) for SGD on quadratic objectives. Experimental results show that Eigencurve can significantly outperform step decay in image classification tasks. Two simple learning rate schedulers for practical applications can approximate Eigencurve.
arXiv Detail & Related papers (2021-10-27T01:17:53Z)
Online Stochastic Optimization with Wasserstein Based Non-stationarity [12.91020811577007]
We consider a general online optimization problem with multiple budget constraints over a horizon of finite time periods. The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints. This formulation captures a wide range of applications including online linear programming and network revenue management.
arXiv Detail & Related papers (2020-12-13T04:47:37Z)
Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation [8.340191147575307]
We introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation. It flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. It is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks, as well as warm-starting it for a new task.
arXiv Detail & Related papers (2020-06-25T13:18:18Z)
The Two Regimes of Deep Network Training [93.84309968956941]
We study the effects of different learning schedules and the appropriate way to select them. To this end, we isolate two distinct phases, which we refer to as the "large-step regime" and the "small-step regime" Our training algorithm can significantly simplify learning rate schedules.
arXiv Detail & Related papers (2020-02-24T17:08:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.