Learning Optimal Admission Control in Partially Observable Queueing
Networks
- URL: http://arxiv.org/abs/2308.02391v1
- Date: Fri, 4 Aug 2023 15:40:23 GMT
- Title: Learning Optimal Admission Control in Partially Observable Queueing
Networks
- Authors: Jonatha Anselmi, Bruno Gaujal, Louis-S\'ebastien Rebuffi
- Abstract summary: We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network.
We show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$.
- Score: 4.254099382808599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an efficient reinforcement learning algorithm that learns the
optimal admission control policy in a partially observable queueing network.
Specifically, only the arrival and departure times from the network are
observable, and optimality refers to the average holding/rejection cost in
infinite horizon.
While reinforcement learning in Partially Observable Markov Decision
Processes (POMDP) is prohibitively expensive in general, we show that our
algorithm has a regret that only depends sub-linearly on the maximal number of
jobs in the network, $S$. In particular, in contrast with existing regret
analyses, our regret bound does not depend on the diameter of the underlying
Markov Decision Process (MDP), which in most queueing systems is at least
exponential in $S$.
The novelty of our approach is to leverage Norton's equivalent theorem for
closed product-form queueing networks and an efficient reinforcement learning
algorithm for MDPs with the structure of birth-and-death processes.
Related papers
- Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes [7.028778922533688]
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty.
We study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning.
arXiv Detail & Related papers (2024-10-14T14:52:23Z) - VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates.
We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks [35.78834550608041]
Classical SNO algorithms require network conditions to be stationary with time.
Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback.
Our proposed UMO2 algorithm ensures network stability and also matches the utility performance of any "mildly varying" reference policy.
arXiv Detail & Related papers (2024-08-29T02:18:28Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Provably Efficient Reinforcement Learning for Online Adaptive Influence
Maximization [53.11458949694947]
We consider an adaptive version of content-dependent online influence problem where seed nodes are sequentially activated based on realtime feedback.
Our algorithm maintains a network model estimate and selects seed adaptively, exploring the social network while improving the optimal policy optimistically.
arXiv Detail & Related papers (2022-06-29T18:17:28Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - RL-QN: A Reinforcement Learning Framework for Optimal Control of
Queueing Systems [8.611328447624677]
We consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks.
Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem.
We propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space.
arXiv Detail & Related papers (2020-11-14T22:12:27Z) - Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks.
The algorithm consistently generates control policies that outperform state-of-arts in literature.
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z) - Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and
Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs)
We propose two near-optimal and oracle-efficient algorithms for FMDPs.
Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.