Scheduling Servers with Stochastic Bilinear Rewards
- URL: http://arxiv.org/abs/2112.06362v1
- Date: Mon, 13 Dec 2021 00:37:20 GMT
- Title: Scheduling Servers with Stochastic Bilinear Rewards
- Authors: Jung-hun Kim and Milan Vojnovic
- Abstract summary: We study a multi-class, multi-server queueing system with rewards of job-server assignments following a bilinear model in feature vectors representing jobs and servers.
We propose a scheduling algorithm that uses a linear bandit algorithm along with dynamic allocation of jobs to servers.
- Score: 3.5408022972081685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we study a multi-class, multi-server queueing system with
stochastic rewards of job-server assignments following a bilinear model in
feature vectors representing jobs and servers. Our goal is regret minimization
against an oracle policy that has a complete information about system
parameters. We propose a scheduling algorithm that uses a linear bandit
algorithm along with dynamic allocation of jobs to servers. For the baseline
setting, in which mean job service times are identical for all jobs, we show
that our algorithm has a sub-linear regret, as well as a sub-linear bound on
the mean queue length, in the horizon time. We further show that similar bounds
hold under more general assumptions, allowing for non-identical mean job
service times for different job classes and a time-varying set of server
classes. We also show that better regret and mean queue length bounds can be
guaranteed by an algorithm having access to traffic intensities of job classes.
We present results of numerical experiments demonstrating how regret and mean
queue length of our algorithms depend on various system parameters and compare
their performance against a previously proposed algorithm using synthetic
randomly generated data and a real-world cluster computing data trace.
Related papers
- Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems [21.944723061337267]
We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers.
Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system.
We propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization.
arXiv Detail & Related papers (2024-02-02T05:22:41Z) - Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints [0.610240618821149]
We study an optimal online resource reservation problem in a simple communication network.
We propose an online saddle-point algorithm for which we present an upper bound for the associated K-benchmark regret.
arXiv Detail & Related papers (2023-05-24T20:47:17Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Optimal Admission Control for Multiclass Queues with Time-Varying
Arrival Rates via State Abstraction [16.99621896314678]
We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks.
The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon.
We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly.
arXiv Detail & Related papers (2022-03-14T12:38:13Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Learning to Schedule [3.5408022972081685]
This paper proposes a learning and scheduling algorithm to minimize the expected cumulative holding cost incurred by jobs.
In each time slot, the server can process a job while receiving the realized random holding costs of the jobs remaining in the system.
arXiv Detail & Related papers (2021-05-28T08:04:06Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud
System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems.
First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch.
Second, for diverse system scales and structures, we use graph neural networks to embed system state information.
Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - An Online Algorithm for Computation Offloading in Non-Stationary
Environments [12.843328612860244]
We consider the latency problem in a task-offloading scenario, where multiple servers are available to the user equipment for outsourcing computational tasks.
To account for the temporally dynamic nature of the wireless links and the availability of the computing resources, we model the server selection as a multi-armed bandit (MAB) problem.
We propose a novel online learning algorithm based on the principle of optimism in the face of uncertainty, which outperforms the state-of-the-art algorithms by up to 1s.
arXiv Detail & Related papers (2020-06-22T07:00:47Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.