The Road Less Scheduled
- URL: http://arxiv.org/abs/2405.15682v2
- Date: Thu, 30 May 2024 21:50:15 GMT
- Title: The Road Less Scheduled
- Authors: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky,
- Abstract summary: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T.
We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely.
- Score: 75.09232139131437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free).
Related papers
- Differentiable Combinatorial Scheduling at Scale [18.09256072039255]
We propose a differentiable scheduling framework, utilizing Gumbel-Softmax differentiable sampling technique.
To encode inequality constraints for scheduling tasks, we introduce textitconstrained Gumbel Trick, which adeptly encodes arbitrary inequality constraints.
Our method facilitates an efficient and scalable scheduling via gradient descent without the need for training data.
arXiv Detail & Related papers (2024-06-06T02:09:39Z) - Locally Optimal Descent for Dynamic Stepsize Scheduling [45.6809308002043]
We introduce a novel dynamic learning scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in phases.
Our approach is based on estimating a locally-optimal practice tuning-rate in the direction of a smooth gradient.
Our findings indicate that our method needs minimal tuning when compared to existing approaches.
arXiv Detail & Related papers (2023-11-23T09:57:35Z) - When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement [51.12097770185634]
Learning rate schedules used in practice bear little resemblance to those recommended by theory.
We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules.
arXiv Detail & Related papers (2023-10-11T19:16:35Z) - Accelerating Exact Combinatorial Optimization via RL-based
Initialization -- A Case Study in Scheduling [1.3053649021965603]
This research aims to develop an innovative approach that employs machine learning (ML) for addressing optimization problems.
We introduce a novel two-phase RL-to-ILP scheduling framework, which includes three steps: 1) solver as coarse-grain scheduler, 2) solution relaxation and 3) exact solving via ILP.
Our framework demonstrates the same scheduling performance compared with using exact scheduling methods while achieving up to 128 $times$ speed improvements.
arXiv Detail & Related papers (2023-08-19T15:52:43Z) - Mechanic: A Learning Rate Tuner [52.4242550204696]
We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call textscmechanic.
We rigorously evaluate textscmechanic on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms.
arXiv Detail & Related papers (2023-05-31T19:32:43Z) - Robust Scheduling with GFlowNets [6.6908747077585105]
We propose a new approach to scheduling by sampling proportionally to the proxy metric using a novel GFlowNet method.
We introduce a technique to control the trade-off between diversity and goodness of the proposed schedules at inference time.
arXiv Detail & Related papers (2023-01-17T18:59:15Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.