Related papers: Learning a Discrete Set of Optimal Allocation Rules in a Queueing System with Unknown Service Rate

Learning a Discrete Set of Optimal Allocation Rules in a Queueing System with Unknown Service Rate

URL: http://arxiv.org/abs/2202.02419v2
Date: Thu, 27 Jul 2023 22:56:22 GMT
Title: Learning a Discrete Set of Optimal Allocation Rules in a Queueing System with Unknown Service Rate
Authors: Saghar Adler, Mehrdad Moharrami and Vijay Subramanian
Abstract summary: We study admission control for a system with unknown arrival and service rates. In our model, at every job arrival, a dispatcher decides to assign the job to an available server or block it. Our goal is to design a dispatching policy that maximizes the long-term average reward for the dispatcher.
Score: 1.4094389874355762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by the wide range of modern applications of the Erlang-B blocking model beyond communication networks and call centers to sizing and pricing in design production systems, messaging systems, and app-based parking systems, we study admission control for such a system but with unknown arrival and service rates. In our model, at every job arrival, a dispatcher decides to assign the job to an available server or block it. Every served job yields a fixed reward for the dispatcher, but it also results in a cost per unit time of service. Our goal is to design a dispatching policy that maximizes the long-term average reward for the dispatcher based on observing only the arrival times and the state of the system at each arrival that reflects a realistic sampling of such systems. Critically, the dispatcher observes neither the service times nor departure times so that standard reinforcement learning-based approaches that use reward signals do not apply. Hence, we develop our learning-based dispatch scheme as a parametric learning problem a'la self-tuning adaptive control. In our problem, certainty equivalent control switches between an always admit if room policy (explore infinitely often) and a never admit policy (immediately terminate learning), which is distinct from the adaptive control literature. Hence, our learning scheme judiciously uses the always admit if room policy so that learning doesn't stall. We prove that for all service rates, the proposed policy asymptotically learns to take the optimal action and present finite-time regret guarantees. The extreme contrast in the certainty equivalent optimal control policies leads to difficulties in learning that show up in our regret bounds for different parameter regimes: constant regret in one regime versus regret growing logarithmically in the other.

Related papers

Learning Verifiable Control Policies Using Relaxed Verification [49.81690518952909]
This work proposes to perform verification throughout training to aim for policies whose properties can be evaluated throughout runtime. The approach is to use differentiable reachability analysis and incorporate new components into the loss function.
arXiv Detail & Related papers (2025-04-23T16:54:35Z)
Centaur: Robust End-to-End Autonomous Driving with Test-Time Training [84.78837437133234]
We propose Centaur, which updates a planner's behavior via test-time training without relying on hand-engineered rules or cost functions. We develop a novel uncertainty measure, called Cluster Entropy, which is simple, interpretable, and compatible with state-of-the-art planning algorithms.
arXiv Detail & Related papers (2025-03-14T17:59:41Z)
Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. We propose a single framework built on their equivalence in learning scenarios. Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z)
A stabilizing reinforcement learning approach for sampled systems with partially unknown models [0.0]
We suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting. To achieve the claimed results, we employ techniques of classical adaptive control. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.
arXiv Detail & Related papers (2022-08-31T09:20:14Z)
Online Control of Unknown Time-Varying Dynamical Systems [48.75672260851758]
We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model. We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
arXiv Detail & Related papers (2022-02-16T06:57:14Z)
Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive. Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical. We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization. Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z)
Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process. We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z)
Queue-Learning: A Reinforcement Learning Approach for Providing Quality of Service [1.8477401359673706]
Servicerate control is a common mechanism for providing guarantees in service systems. In this paper, we introduce a reinforcement learning-based (RL-based) service-rate controller. Our controller provides explicit probabilistic guarantees on the end-to-end delay of the system.
arXiv Detail & Related papers (2021-01-12T17:28:57Z)
Complementary Meta-Reinforcement Learning for Fault-Adaptive Control [1.8799681615947088]
Adaptive fault-tolerant control maintains degraded performance when faults occur as opposed to unsafe conditions or catastrophic events. We present a meta-reinforcement learning approach that quickly adapts its control policy to changing conditions. We evaluate our approach on an aircraft fuel transfer system under abrupt faults.
arXiv Detail & Related papers (2020-09-26T16:30:53Z)
Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning. We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.