Learning a Discrete Set of Optimal Allocation Rules in a Queueing System
with Unknown Service Rate
- URL: http://arxiv.org/abs/2202.02419v2
- Date: Thu, 27 Jul 2023 22:56:22 GMT
- Title: Learning a Discrete Set of Optimal Allocation Rules in a Queueing System
with Unknown Service Rate
- Authors: Saghar Adler, Mehrdad Moharrami and Vijay Subramanian
- Abstract summary: We study admission control for a system with unknown arrival and service rates.
In our model, at every job arrival, a dispatcher decides to assign the job to an available server or block it.
Our goal is to design a dispatching policy that maximizes the long-term average reward for the dispatcher.
- Score: 1.4094389874355762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the wide range of modern applications of the Erlang-B blocking
model beyond communication networks and call centers to sizing and pricing in
design production systems, messaging systems, and app-based parking systems, we
study admission control for such a system but with unknown arrival and service
rates. In our model, at every job arrival, a dispatcher decides to assign the
job to an available server or block it. Every served job yields a fixed reward
for the dispatcher, but it also results in a cost per unit time of service. Our
goal is to design a dispatching policy that maximizes the long-term average
reward for the dispatcher based on observing only the arrival times and the
state of the system at each arrival that reflects a realistic sampling of such
systems. Critically, the dispatcher observes neither the service times nor
departure times so that standard reinforcement learning-based approaches that
use reward signals do not apply. Hence, we develop our learning-based dispatch
scheme as a parametric learning problem a'la self-tuning adaptive control. In
our problem, certainty equivalent control switches between an always admit if
room policy (explore infinitely often) and a never admit policy (immediately
terminate learning), which is distinct from the adaptive control literature.
Hence, our learning scheme judiciously uses the always admit if room policy so
that learning doesn't stall. We prove that for all service rates, the proposed
policy asymptotically learns to take the optimal action and present finite-time
regret guarantees. The extreme contrast in the certainty equivalent optimal
control policies leads to difficulties in learning that show up in our regret
bounds for different parameter regimes: constant regret in one regime versus
regret growing logarithmically in the other.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - A stabilizing reinforcement learning approach for sampled systems with
partially unknown models [0.0]
We suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting.
To achieve the claimed results, we employ techniques of classical adaptive control.
The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.
arXiv Detail & Related papers (2022-08-31T09:20:14Z) - Online Control of Unknown Time-Varying Dynamical Systems [48.75672260851758]
We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model.
We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
arXiv Detail & Related papers (2022-02-16T06:57:14Z) - Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch
Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical.
We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization.
Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Queue-Learning: A Reinforcement Learning Approach for Providing Quality
of Service [1.8477401359673706]
Servicerate control is a common mechanism for providing guarantees in service systems.
In this paper, we introduce a reinforcement learning-based (RL-based) service-rate controller.
Our controller provides explicit probabilistic guarantees on the end-to-end delay of the system.
arXiv Detail & Related papers (2021-01-12T17:28:57Z) - Complementary Meta-Reinforcement Learning for Fault-Adaptive Control [1.8799681615947088]
Adaptive fault-tolerant control maintains degraded performance when faults occur as opposed to unsafe conditions or catastrophic events.
We present a meta-reinforcement learning approach that quickly adapts its control policy to changing conditions.
We evaluate our approach on an aircraft fuel transfer system under abrupt faults.
arXiv Detail & Related papers (2020-09-26T16:30:53Z) - Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning.
We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.