Related papers: Learning Optimal Admission Control in Partially Observable Queueing Networks

Learning Optimal Admission Control in Partially Observable Queueing Networks

URL: http://arxiv.org/abs/2308.02391v1
Date: Fri, 4 Aug 2023 15:40:23 GMT
Title: Learning Optimal Admission Control in Partially Observable Queueing Networks
Authors: Jonatha Anselmi, Bruno Gaujal, Louis-S\'ebastien Rebuffi
Abstract summary: We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. We show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$.
Score: 4.254099382808599
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. Specifically, only the arrival and departure times from the network are observable, and optimality refers to the average holding/rejection cost in infinite horizon. While reinforcement learning in Partially Observable Markov Decision Processes (POMDP) is prohibitively expensive in general, we show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$. In particular, in contrast with existing regret analyses, our regret bound does not depend on the diameter of the underlying Markov Decision Process (MDP), which in most queueing systems is at least exponential in $S$. The novelty of our approach is to leverage Norton's equivalent theorem for closed product-form queueing networks and an efficient reinforcement learning algorithm for MDPs with the structure of birth-and-death processes.

Related papers

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes [7.028778922533688]
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. We study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning.
arXiv Detail & Related papers (2024-10-14T14:52:23Z)
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates. We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z)
Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks [35.78834550608041]
Classical SNO algorithms require network conditions to be stationary with time. Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback. Our proposed UMO2 algorithm ensures network stability and also matches the utility performance of any "mildly varying" reference policy.
arXiv Detail & Related papers (2024-08-29T02:18:28Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
The Transient Cost of Learning in Queueing Systems [4.257386020315728]
Queueing systems are widely applicable models with use cases in communication networks, healthcare, service systems, etc. We propose the Transient Cost of Learning in Queueing (TCLQ) to quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. In establishing our results, we propose a unified analysis framework for TCLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.
arXiv Detail & Related papers (2023-08-15T14:50:12Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization [53.11458949694947]
We consider an adaptive version of content-dependent online influence problem where seed nodes are sequentially activated based on realtime feedback. Our algorithm maintains a network model estimate and selects seed adaptively, exploring the social network while improving the optimal policy optimistically.
arXiv Detail & Related papers (2022-06-29T18:17:28Z)
Large-scale Optimization of Partial AUC in a Range of False Positive Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique. Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems [8.611328447624677]
We consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. We propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space.
arXiv Detail & Related papers (2020-11-14T22:12:27Z)
Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks. The algorithm consistently generates control policies that outperform state-of-arts in literature. A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z)
Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting [24.90164851620799]
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs) We propose two near-optimal and oracle-efficient algorithms for FMDPs. Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
arXiv Detail & Related papers (2020-02-06T15:19:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.