Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions
- URL: http://arxiv.org/abs/2512.20967v1
- Date: Wed, 24 Dec 2025 05:47:27 GMT
- Title: Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions
- Authors: Linggao Kong, Yuedong Xu, Lei Jiao, Chuan Xu,
- Abstract summary: We show the power of prediction in enabling cost-efficient scheduling and its sensitivity to estimation errors.<n>We propose an online allocation algorithm with prediction based on the committed horizon control approach.<n>An online policy selection algorithm is developed that learns the best policy from a pool constructed by varying the parameters of both algorithms.
- Score: 11.849924812127371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As foundation models grow in size, fine-tuning them becomes increasingly expensive. While GPU spot instances offer a low-cost alternative to on-demand resources, their volatile prices and availability make deadline-aware scheduling particularly challenging. We tackle this difficulty by using a mix of spot and on-demand instances. Distinctively, we show the predictability of prices and availability in a spot instance market, the power of prediction in enabling cost-efficient scheduling and its sensitivity to estimation errors. An integer programming problem is formulated to capture the use of mixed instances under both the price and availability dynamics. We propose an online allocation algorithm with prediction based on the committed horizon control approach that leverages a \emph{commitment level} to enforce the partial sequence of decisions. When this prediction becomes inaccurate, we further present a complementary online algorithm without predictions. An online policy selection algorithm is developed that learns the best policy from a pool constructed by varying the parameters of both algorithms. We prove that the prediction-based algorithm achieves tighter performance bounds as prediction error decreases, while the policy selection algorithm possesses a regret bound of $\mathcal{O}(\sqrt{T})$. Experimental results demonstrate that our online framework can adaptively select the best policy under varying spot market dynamics and prediction quality, consistently outperforming baselines and improving utility by up to 54.8\%.
Related papers
- Monitoring State Transitions in Markovian Systems with Sampling Cost [65.4151496405543]
A natural approach is a greedy policy that predicts when the expected prediction loss is below the query cost and queries otherwise.<n>We analyze this policy in a Markovian setting, where the optimal (OPT) strategy is a state-dependent threshold policy.<n>For the case of unknown transition probabilities, we propose a projected gradient descent (PSGD)-based learning variant of the greedy policy.
arXiv Detail & Related papers (2025-10-25T15:07:37Z) - AdaSwitch: An Adaptive Switching Meta-Algorithm for Learning-Augmented Bounded-Influence Problems [9.387255955861162]
We study a class of multi-period online decision-making problems with sequence-based predictions.<n>In each period, the decision-maker observes the realized request and must take an irrevocable action that yields a reward or incurs a cost.<n>We introduce a bounded-influence framework, in which past decisions and requests exert only limited impact on the future optimal reward.<n>Within this framework, we propose the AdaSwitch meta-algorithm, which exploits predictions to attain performance close to the offline benchmark when predictions are accurate.
arXiv Detail & Related papers (2025-09-02T13:26:23Z) - No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need! [56.80767500991973]
We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
arXiv Detail & Related papers (2025-06-16T08:42:31Z) - Non-clairvoyant Scheduling with Partial Predictions [17.387787159892287]
We present a learning-augmented algorithm satisfying the robustness, consistency, and smoothness criteria.
We also present a novel tradeoff between consistency and smoothness inherent in the scenario with a restricted number of predictions.
arXiv Detail & Related papers (2024-05-02T05:29:22Z) - Best of Many in Both Worlds: Online Resource Allocation with Predictions under Unknown Arrival Model [16.466711636334587]
Online decision-makers often obtain predictions on future variables, such as arrivals, demands, and so on.
Prediction accuracy is unknown to decision-makers a priori, hence blindly following the predictions can be harmful.
We develop algorithms that utilize predictions in a manner that is robust to the unknown prediction accuracy.
arXiv Detail & Related papers (2024-02-21T04:57:32Z) - Constrained Online Two-stage Stochastic Optimization: Algorithm with
(and without) Predictions [19.537289123577022]
We consider an online two-stage optimization with long-term constraints over a finite horizon of $T$ periods.
We develop online algorithms for the online two-stage problem from adversarial learning algorithms.
arXiv Detail & Related papers (2024-01-02T07:46:33Z) - Diffusion Variational Autoencoder for Tackling Stochasticity in
Multi-Step Regression Stock Price Prediction [54.21695754082441]
Multi-step stock price prediction over a long-term horizon is crucial for forecasting its volatility.
Current solutions to multi-step stock price prediction are mostly designed for single-step, classification-based predictions.
We combine a deep hierarchical variational-autoencoder (VAE) and diffusion probabilistic techniques to do seq2seq stock prediction.
Our model is shown to outperform state-of-the-art solutions in terms of its prediction accuracy and variance.
arXiv Detail & Related papers (2023-08-18T16:21:15Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - A Universal Error Measure for Input Predictions Applied to Online Graph
Problems [57.58926849872494]
We introduce a novel measure for quantifying the error in input predictions.
The measure captures errors due to absent predicted requests as well as unpredicted actual requests.
arXiv Detail & Related papers (2022-05-25T15:24:03Z) - Smoothed Online Combinatorial Optimization Using Imperfect Predictions [27.201074566335222]
We study smoothed online optimization problems when an imperfect predictive model is available.
We show that using predictions to plan for a finite time horizon leads to regret dependent on the total predictive uncertainty and an additional switching cost.
Our algorithm shows a significant improvement in cumulative regret compared to other baselines in synthetic online distributed streaming problems.
arXiv Detail & Related papers (2022-04-23T02:30:39Z) - Leveraging Predictions in Smoothed Online Convex Optimization via
Gradient-based Algorithms [18.64335888217192]
We consider online convex optimization with time-varying stage costs and additional switching costs.
Since the switching costs introduce coupling across all stages, long-term predictions tend to suffer from lower quality.
We introduce a gradient-based online algorithm, Receding Horizon Inexact Gradient (RHIG), and analyze its performance by dynamic regrets.
arXiv Detail & Related papers (2020-11-25T06:25:51Z) - Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online
Algorithms [85.97516436641533]
We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions.
The goal is to design algorithms that are both consistent and robust.
We provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions.
arXiv Detail & Related papers (2020-10-22T04:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.