Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
- URL: http://arxiv.org/abs/2406.01386v1
- Date: Mon, 3 Jun 2024 14:48:53 GMT
- Title: Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
- Authors: Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong Wang, Shuai Li, Mohammad Hajiesmaili, John C. S. Lui, Wei Chen,
- Abstract summary: We introduce a novel framework of multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT)
Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables.
Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution.
- Score: 58.39457881271146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome of each arm is a $d$-dimensional multivariant random variable and the feedback follows a general arm triggering process. Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables. For CMAB-MT, we propose a general 1-norm multivariant and triggering probability-modulated smoothness condition, and an optimistic CUCB-MT algorithm built upon this condition. Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution, all of which meet the above smoothness condition and achieve matching or improved regret bounds compared to existing works. Through our new framework, we build the first connection between the episodic RL and CMAB literature, by offering a new angle to solve the episodic RL through the lens of CMAB, which may encourage more interactions between these two important directions.
Related papers
- GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits [16.054685587034836]
GINO-Q is a three-timescale approximation algorithm designed to learn an optimal index policy for restless multi-armed bandit (RMAB)
GINO-Q does not require RMABs to be indexable, enhancing its flexibility and applicability.
Our experimental results demonstrate that GINO-Q consistently learns near optimal policies, even for non-indexable RMABs.
arXiv Detail & Related papers (2024-08-19T10:50:45Z) - Towards a Pretrained Model for Restless Bandits via Multi-arm
Generalization [32.90636136408938]
Restless multi-arm bandits (RMABs) are resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching.
We develop a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs.
arXiv Detail & Related papers (2023-10-23T03:16:32Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with
Probabilistically Triggered Arms or Independent Arms [53.89752069984382]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z) - A Novel Unified Conditional Score-based Generative Framework for
Multi-modal Medical Image Completion [54.512440195060584]
We propose the Unified Multi-Modal Conditional Score-based Generative Model (UMM-CSGM) to take advantage of Score-based Generative Model (SGM)
UMM-CSGM employs a novel multi-in multi-out Conditional Score Network (mm-CSN) to learn a comprehensive set of cross-modal conditional distributions.
Experiments on BraTS19 dataset show that the UMM-CSGM can more reliably synthesize the heterogeneous enhancement and irregular area in tumor-induced lesions.
arXiv Detail & Related papers (2022-07-07T16:57:21Z) - Robust Restless Bandits: Tackling Interval Uncertainty with Deep
Reinforcement Learning [31.515757763077065]
We introduce Robust Restless Bandits, a generalization of restless multi-arm bandits (RMAB)
We develop solutions for a minimax regret objective when transitions are given by interval uncertainties.
We introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs.
arXiv Detail & Related papers (2021-07-04T17:21:26Z) - Q-Learning Lagrange Policies for Multi-Action Restless Bandits [35.022322303796216]
Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed.
We design the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning.
arXiv Detail & Related papers (2021-06-22T19:20:09Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.