Smoothed Online Optimization for Target Tracking: Robust and Learning-Augmented Algorithms
- URL: http://arxiv.org/abs/2509.05930v1
- Date: Sun, 07 Sep 2025 05:22:41 GMT
- Title: Smoothed Online Optimization for Target Tracking: Robust and Learning-Augmented Algorithms
- Authors: Ali Zeynali, Mahsa Sahebdel, Qingsong Liu, Mohammad Hajiesmaili, Ramesh K. Sitaraman,
- Abstract summary: We introduce a new framework that integrates three key objectives in online decision-making under uncertainty.<n>We first present BEST, a robust algorithm with provable competitive guarantees for SOOTT.<n>To enhance practical performance, we introduce CoRT, a learning-augmented variant that incorporates untrusted black-box predictions.
- Score: 14.970453355735225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the Smoothed Online Optimization for Target Tracking (SOOTT) problem, a new framework that integrates three key objectives in online decision-making under uncertainty: (1) tracking cost for following a dynamically moving target, (2) adversarial perturbation cost for withstanding unpredictable disturbances, and (3) switching cost for penalizing abrupt changes in decisions. This formulation captures real-world scenarios such as elastic and inelastic workload scheduling in AI clusters, where operators must balance long-term service-level agreements (e.g., LLM training) against sudden demand spikes (e.g., real-time inference). We first present BEST, a robust algorithm with provable competitive guarantees for SOOTT. To enhance practical performance, we introduce CoRT, a learning-augmented variant that incorporates untrusted black-box predictions (e.g., from ML models) into its decision process. Our theoretical analysis shows that CoRT strictly improves over BEST when predictions are accurate, while maintaining robustness under arbitrary prediction errors. We validate our approach through a case study on workload scheduling, demonstrating that both algorithms effectively balance trajectory tracking, decision smoothness, and resilience to external disturbances.
Related papers
- Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z) - Unifying Stable Optimization and Reference Regularization in RLHF [64.16830602324345]
This paper introduces a unified regularization approach that balances objectives of preventing reward hacking and maintaining stable policy updates.<n>Our simple yet principled alignment objective yields a weighted supervised fine-tuning loss with a superior trade-off, which demonstrably improves both alignment results and implementation complexity.
arXiv Detail & Related papers (2026-02-12T03:31:19Z) - Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions [11.849924812127371]
We show the power of prediction in enabling cost-efficient scheduling and its sensitivity to estimation errors.<n>We propose an online allocation algorithm with prediction based on the committed horizon control approach.<n>An online policy selection algorithm is developed that learns the best policy from a pool constructed by varying the parameters of both algorithms.
arXiv Detail & Related papers (2025-12-24T05:47:27Z) - Dynamic Epsilon Scheduling: A Multi-Factor Adaptive Perturbation Budget for Adversarial Training [1.5558386948322986]
Adversarial training is one of the most effective strategies for defending neural networks against adversarial examples.<n>Existing adversarial training approaches rely on a fixed perturbation budget, which fails to account for robustness-specific characteristics.<n>We propose Dynamic Epsilon Scheduling (DES), a novel framework that adaptively adjusts the adversarial perturbation budget per instance and per training instance.
arXiv Detail & Related papers (2025-06-03T04:18:53Z) - Global-Decision-Focused Neural ODEs for Proactive Grid Resilience Management [50.34345101758248]
We propose predict-all-then-optimize-globally (PATOG), a framework that integrates outage prediction with globally optimized interventions.<n>Our approach ensures spatially and temporally coherent decision-making, improving both predictive accuracy and operational efficiency.<n>Experiments on synthetic and real-world datasets demonstrate significant improvements in outage prediction consistency and grid resilience.
arXiv Detail & Related papers (2025-02-25T16:15:35Z) - Optimistic Online Non-stochastic Control via FTRL [10.25772015681554]
This paper brings the concept of optimism" to the new and promising framework of online Non-stochastic Control.
By addressing the challenge of incorporating untrusted predictions into online control, this work contributes to the advancement of the NSC framework.
arXiv Detail & Related papers (2024-04-04T09:08:04Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online
Algorithms [85.97516436641533]
We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions.
The goal is to design algorithms that are both consistent and robust.
We provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions.
arXiv Detail & Related papers (2020-10-22T04:51:01Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.