Robust Restless Bandits: Tackling Interval Uncertainty with Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2107.01689v1
- Date: Sun, 4 Jul 2021 17:21:26 GMT
- Title: Robust Restless Bandits: Tackling Interval Uncertainty with Deep
Reinforcement Learning
- Authors: Jackson A. Killian, Lily Xu, Arpita Biswas, Milind Tambe
- Abstract summary: We introduce Robust Restless Bandits, a generalization of restless multi-arm bandits (RMAB)
We develop solutions for a minimax regret objective when transitions are given by interval uncertainties.
We introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs.
- Score: 31.515757763077065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Robust Restless Bandits, a challenging generalization of
restless multi-arm bandits (RMAB). RMABs have been widely studied for
intervention planning with limited resources. However, most works make the
unrealistic assumption that the transition dynamics are known perfectly,
restricting the applicability of existing methods to real-world scenarios. To
make RMABs more useful in settings with uncertain dynamics: (i) We introduce
the Robust RMAB problem and develop solutions for a minimax regret objective
when transitions are given by interval uncertainties; (ii) We develop a double
oracle algorithm for solving Robust RMABs and demonstrate its effectiveness on
three experimental domains; (iii) To enable our double oracle approach, we
introduce RMABPPO, a novel deep reinforcement learning algorithm for solving
RMABs. RMABPPO hinges on learning an auxiliary "$\lambda$-network" that allows
each arm's learning to decouple, greatly reducing sample complexity required
for training; (iv) Under minimax regret, the adversary in the double oracle
approach is notoriously difficult to implement due to non-stationarity. To
address this, we formulate the adversary oracle as a multi-agent reinforcement
learning problem and solve it with a multi-agent extension of RMABPPO, which
may be of independent interest as the first known algorithm for this setting.
Code is available at https://github.com/killian-34/RobustRMAB.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits [16.054685587034836]
GINO-Q is a three-timescale approximation algorithm designed to learn an optimal index policy for restless multi-armed bandit (RMAB)
GINO-Q does not require RMABs to be indexable, enhancing its flexibility and applicability.
Our experimental results demonstrate that GINO-Q consistently learns near optimal policies, even for non-indexable RMABs.
arXiv Detail & Related papers (2024-08-19T10:50:45Z) - A Federated Online Restless Bandit Framework for Cooperative Resource Allocation [23.698976872351576]
We study the cooperative resource allocation problem with unknown system dynamics of MRPs.
We put forth a Federated Thompson-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem.
Numerical results show that the proposed algorithm achieves a fast convergence rate of $mathcalO(sqrtTlog(T))$ and better performance compared with baselines.
arXiv Detail & Related papers (2024-06-12T08:34:53Z) - Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond [58.39457881271146]
We introduce a novel framework of multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT)
Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables.
Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution.
arXiv Detail & Related papers (2024-06-03T14:48:53Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Stability-penalty-adaptive follow-the-regularized-leader: Sparsity,
game-dependency, and best-of-both-worlds [46.30750729936261]
Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems.
We establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW)
arXiv Detail & Related papers (2023-05-26T23:20:48Z) - Optimistic Whittle Index Policy: Online Learning for Restless Bandits [31.312043984489666]
We propose the first online learning algorithm based on the Whittle index policy to learn transition dynamics.
Our algorithm, UCWhittle, achieves sublinear $O(sqrtT log T)$ frequentist regret to solve RMABs with unknown transitions.
arXiv Detail & Related papers (2022-05-30T18:32:20Z) - Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits [91.8283876874947]
We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm.
We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives.
We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
arXiv Detail & Related papers (2021-10-16T17:52:32Z) - Q-Learning Lagrange Policies for Multi-Action Restless Bandits [35.022322303796216]
Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed.
We design the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning.
arXiv Detail & Related papers (2021-06-22T19:20:09Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.