Related papers: Scheduling Servers with Stochastic Bilinear Rewards

Scheduling Servers with Stochastic Bilinear Rewards

URL: http://arxiv.org/abs/2112.06362v3
Date: Sun, 1 Sep 2024 07:45:49 GMT
Title: Scheduling Servers with Stochastic Bilinear Rewards
Authors: Jung-hun Kim, Milan Vojnovic,
Abstract summary: A system optimization problem arises in multi-class, multi-server queueing system scheduling. We propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward. Our algorithm sub-linear regret and sublinear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability.
Score: 7.519872646378837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features characterizing jobs and servers. Our objective is regret minimization, aiming to maximize the cumulative reward of job-server assignments over a time horizon while maintaining a bounded total job holding cost, thus ensuring queueing system stability. This problem is motivated by applications in computing services and online platforms. To address this problem, we propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward maximization, incorporating a bandit strategy. Our algorithm achieves sub-linear regret and sub-linear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we validate the efficiency of our algorithm through numerical experiments.

Related papers

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need! [56.80767500991973]
We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
arXiv Detail & Related papers (2025-06-16T08:42:31Z)
Learning payoffs while routing in skill-based queues [0.4077787659104315]
We construct a machine learning algorithm that adaptively learns the payoff parameters while maximizing the total payoff. We show that the algorithm is optimal up to logarithmic terms by deriving a regret lower bound.
arXiv Detail & Related papers (2024-12-13T14:33:50Z)
Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems [21.944723061337267]
We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. We propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization.
arXiv Detail & Related papers (2024-02-02T05:22:41Z)
Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints [0.610240618821149]
We study an optimal online resource reservation problem in a simple communication network. We propose an online saddle-point algorithm for which we present an upper bound for the associated K-benchmark regret.
arXiv Detail & Related papers (2023-05-24T20:47:17Z)
A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z)
Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State Abstraction [16.99621896314678]
We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly.
arXiv Detail & Related papers (2022-03-14T12:38:13Z)
Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems. We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z)
Learning to Schedule [3.5408022972081685]
This paper proposes a learning and scheduling algorithm to minimize the expected cumulative holding cost incurred by jobs. In each time slot, the server can process a job while receiving the realized random holding costs of the jobs remaining in the system.
arXiv Detail & Related papers (2021-05-28T08:04:06Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch. Second, for diverse system scales and structures, we use graph neural networks to embed system state information. Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z)
Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy. A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z)
An Online Algorithm for Computation Offloading in Non-Stationary Environments [12.843328612860244]
We consider the latency problem in a task-offloading scenario, where multiple servers are available to the user equipment for outsourcing computational tasks. To account for the temporally dynamic nature of the wireless links and the availability of the computing resources, we model the server selection as a multi-armed bandit (MAB) problem. We propose a novel online learning algorithm based on the principle of optimism in the face of uncertainty, which outperforms the state-of-the-art algorithms by up to 1s.
arXiv Detail & Related papers (2020-06-22T07:00:47Z)
Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination. We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.