The Road Less Scheduled
- URL: http://arxiv.org/abs/2405.15682v4
- Date: Tue, 29 Oct 2024 22:40:23 GMT
- Title: The Road Less Scheduled
- Authors: Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky,
- Abstract summary: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T.
We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely.
Our Schedule-Free approach introduces no additional hyper- parameters over standard schedules with momentum.
- Score: 45.01813613035411
- License:
- Abstract: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available at https://github.com/facebookresearch/schedule_free. Schedule-Free AdamW is the core algorithm behind our winning entry to the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track.
Related papers
- Differentiable Combinatorial Scheduling at Scale [18.09256072039255]
We propose a differentiable scheduling framework, utilizing Gumbel-Softmax differentiable sampling technique.
To encode inequality constraints for scheduling tasks, we introduce textitconstrained Gumbel Trick, which adeptly encodes arbitrary inequality constraints.
Our method facilitates an efficient and scalable scheduling via gradient descent without the need for training data.
arXiv Detail & Related papers (2024-06-06T02:09:39Z) - Locally Optimal Descent for Dynamic Stepsize Scheduling [45.6809308002043]
We introduce a novel dynamic learning scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in phases.
Our approach is based on estimating a locally-optimal practice tuning-rate in the direction of a smooth gradient.
Our findings indicate that our method needs minimal tuning when compared to existing approaches.
arXiv Detail & Related papers (2023-11-23T09:57:35Z) - Optimal Linear Decay Learning Rate Schedules and Further Refinements [46.79573408189601]
Learning rate schedules used in practice bear little resemblance to those recommended by theory.
We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules.
arXiv Detail & Related papers (2023-10-11T19:16:35Z) - Accelerating Exact Combinatorial Optimization via RL-based
Initialization -- A Case Study in Scheduling [1.3053649021965603]
This research aims to develop an innovative approach that employs machine learning (ML) for addressing optimization problems.
We introduce a novel two-phase RL-to-ILP scheduling framework, which includes three steps: 1) solver as coarse-grain scheduler, 2) solution relaxation and 3) exact solving via ILP.
Our framework demonstrates the same scheduling performance compared with using exact scheduling methods while achieving up to 128 $times$ speed improvements.
arXiv Detail & Related papers (2023-08-19T15:52:43Z) - Mechanic: A Learning Rate Tuner [52.4242550204696]
We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call textscmechanic.
We rigorously evaluate textscmechanic on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms.
arXiv Detail & Related papers (2023-05-31T19:32:43Z) - Robust Scheduling with GFlowNets [6.6908747077585105]
We propose a new approach to scheduling by sampling proportionally to the proxy metric using a novel GFlowNet method.
We introduce a technique to control the trade-off between diversity and goodness of the proposed schedules at inference time.
arXiv Detail & Related papers (2023-01-17T18:59:15Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.