Learning to Schedule Online Tasks with Bandit Feedback
- URL: http://arxiv.org/abs/2402.16463v1
- Date: Mon, 26 Feb 2024 10:11:28 GMT
- Title: Learning to Schedule Online Tasks with Bandit Feedback
- Authors: Yongxin Xu, Shangshang Wang, Hengquan Guo, Xin Liu, Ziyu Shao
- Abstract summary: Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing.
We propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm.
DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module.
- Score: 7.671139712158846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online task scheduling serves an integral role for task-intensive
applications in cloud computing and crowdsourcing. Optimal scheduling can
enhance system performance, typically measured by the reward-to-cost ratio,
under some task arrival distribution. On one hand, both reward and cost are
dependent on task context (e.g., evaluation metric) and remain black-box in
practice. These render reward and cost hard to model thus unknown before
decision making. On the other hand, task arrival behaviors remain sensitive to
factors like unpredictable system fluctuation whereby a prior estimation or the
conventional assumption of arrival distribution (e.g., Poisson) may fail. This
implies another practical yet often neglected challenge, i.e., uncertain task
arrival distribution. Towards effective scheduling under a stationary
environment with various uncertainties, we propose a double-optimistic learning
based Robbins-Monro (DOL-RM) algorithm. Specifically, DOL-RM integrates a
learning module that incorporates optimistic estimation for reward-to-cost
ratio and a decision module that utilizes the Robbins-Monro method to
implicitly learn task arrival distribution while making scheduling decisions.
Theoretically, DOL-RM achieves convergence gap and no regret learning with a
sub-linear regret of $O(T^{3/4})$, which is the first result for online task
scheduling under uncertain task arrival distribution and unknown reward and
cost. Our numerical results in a synthetic experiment and a real-world
application demonstrate the effectiveness of DOL-RM in achieving the best
cumulative reward-to-cost ratio compared with other state-of-the-art baselines.
Related papers
- Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes.
Despite theoretically appealing, the application of dominance in machine learning has been scarce.
We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables.
We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning.
IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z) - Online Task Scheduling for Fog Computing with Multi-Resource Fairness [9.959176097194675]
In fog computing systems, one key challenge is online task scheduling, i.e., to decide the resource allocation for tasks that are continuously generated from end devices.
We propose FairTS, an online task scheduling scheme that learns directly from experience to effectively shorten average task slowdown.
Simulation results show that FairTS outperforms state-of-the-art schemes with an ultra-low task slowdown and better resource fairness.
arXiv Detail & Related papers (2020-08-01T07:57:40Z) - Regularized Online Allocation Problems: Fairness and Beyond [7.433931244705934]
We introduce the emphregularized online allocation problem, a variant that includes a non-linear regularizer acting on the total resource consumption.
In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources.
The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints.
arXiv Detail & Related papers (2020-07-01T14:24:58Z) - Group-Fair Online Allocation in Continuous Time [27.32936573198251]
We consider continuous-time online learning problem with fairness considerations.
We show that this formulation recovers reward-maximizing, max-min fair and proportionally fair allocation rules.
We propose a novel online learning algorithm based on dual ascent optimization for time averages.
arXiv Detail & Related papers (2020-06-11T21:56:53Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.