Related papers: Reinforcement Learning and Regret Bounds for Admission Control

Reinforcement Learning and Regret Bounds for Admission Control

URL: http://arxiv.org/abs/2406.04766v1
Date: Fri, 7 Jun 2024 09:09:14 GMT
Title: Reinforcement Learning and Regret Bounds for Admission Control
Authors: Lucas Weber, Ana Bušić, Jiamin Zhu,
Abstract summary: The expected regret of any reinforcement learning algorithm is lower bounded by $Omegaleft(sqrtDXATright)$ for undiscounted returns. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(Slog T + sqrtmT log T)$ in the finite server case.
Score: 0.08192907805418585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The expected regret of any reinforcement learning algorithm is lower bounded by $\Omega\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.

Related papers

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space [0.0]
We revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. In our main result, we show that the regret of a slightly-ted version of the classical learning algorithm sc Ucrl2 is in fact upper bounded by $tildemathcalO(sqrtEAT)$ where $E$ is related to the weighted second moment of the stationary measure of a reference policy.
arXiv Detail & Related papers (2023-02-21T13:28:37Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z)
On learning Whittle index policy for restless bandits with scalable regret [1.0152838128195465]
Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies. The cumulative regret of most RL algorithms scales as $tilde O(mathsfS sqrtmathsfA T)$, where $mathsfS$ is the size of the state space. We present a model-based RL algorithm for such problems which has scalable regret.
arXiv Detail & Related papers (2022-02-07T19:07:02Z)
Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $widetildeO((d_u+d_x)sqrtd_xT)$ when $A$ and $B$ are unknown.
arXiv Detail & Related papers (2021-09-29T14:07:21Z)
Minimax Regret for Stochastic Shortest Path [63.45407095296692]
We study the Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. We show that the minimax regret for this setting is $widetilde O(B_star sqrt|S| |A| K)$ where $B_star$ is a bound on the expected cost of the optimal policy from any state. Our algorithm runs in-time per episode, and is based on a novel reduction to reinforcement learning in finite-horizon MDPs.
arXiv Detail & Related papers (2021-03-24T10:11:49Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)
Logistic Regression Regret: What's the Catch? [3.7311680121118345]
We derive lower bounds with logarithmic regret under $L_infty$ constraints on the parameters. For $L$ constraints, it is shown that for large enough $d$, the regret remains linear in $d$ but no longer logarithmic in $T$.
arXiv Detail & Related papers (2020-02-07T18:36:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.