Towards a Pretrained Model for Restless Bandits via Multi-arm
Generalization
- URL: http://arxiv.org/abs/2310.14526v3
- Date: Tue, 30 Jan 2024 02:35:51 GMT
- Title: Towards a Pretrained Model for Restless Bandits via Multi-arm
Generalization
- Authors: Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj
Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe
- Abstract summary: Restless multi-arm bandits (RMABs) are resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching.
We develop a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs.
- Score: 32.90636136408938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restless multi-arm bandits (RMABs), a class of resource allocation problems
with broad application in areas such as healthcare, online advertising, and
anti-poaching, have recently been studied from a multi-agent reinforcement
learning perspective. Prior RMAB research suffers from several limitations,
e.g., it fails to adequately address continuous states, and requires retraining
from scratch when arms opt-in and opt-out over time, a common challenge in many
real world applications. We address these limitations by developing a neural
network-based pre-trained model (PreFeRMAB) that has general zero-shot ability
on a wide range of previously unseen RMABs, and which can be fine-tuned on
specific instances in a more sample-efficient way than retraining from scratch.
Our model also accommodates general multi-action settings and discrete or
continuous state spaces. To enable fast generalization, we learn a novel single
policy network model that utilizes feature information and employs a training
procedure in which arms opt-in and out over time. We derive a new update rule
for a crucial $\lambda$-network with theoretical convergence guarantees and
empirically demonstrate the advantages of our approach on several challenging,
real-world inspired problems.
Related papers
- A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond [58.39457881271146]
We introduce a novel framework of multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT)
Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables.
Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution.
arXiv Detail & Related papers (2024-06-03T14:48:53Z) - A General Framework for Learning from Weak Supervision [93.89870459388185]
This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm.
Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources.
We also present an advanced algorithm that significantly simplifies the EM computational demands.
arXiv Detail & Related papers (2024-02-02T21:48:50Z) - Multimodal Guidance Network for Missing-Modality Inference in Content Moderation [6.933986643759809]
We propose a novel guidance network that promotes knowledge sharing during training.
We show that our proposed framework trains single-modality models that significantly outperform traditionally trained counterparts.
arXiv Detail & Related papers (2023-09-07T02:26:55Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery [76.63807209414789]
We challenge the status quo in class-iNCD and propose a learning paradigm where class discovery occurs continuously and truly unsupervisedly.
We propose simple baselines, composed of a frozen PTM backbone and a learnable linear classifier, that are not only simple to implement but also resilient under longer learning scenarios.
arXiv Detail & Related papers (2023-03-28T13:47:16Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Generalization of Deep Reinforcement Learning for Jammer-Resilient
Frequency and Power Allocation [4.436632973105495]
We tackle the problem of joint frequency and power allocation while emphasizing the generalization capability of a deep reinforcement learning model.
We show the improved training and inference performance of the proposed methods when tested on previously unseen simulated wireless networks.
The end-to-end solution was implemented on the embedded software-defined radio and validated using over-the-air evaluation.
arXiv Detail & Related papers (2023-02-04T22:15:32Z) - Global-Local Regularization Via Distributional Robustness [26.983769514262736]
Deep neural networks are often vulnerable to adversarial examples and distribution shifts.
Recent approaches leverage distributional robustness optimization (DRO) to find the most challenging distribution.
We propose a novel regularization technique, following the veins of Wasserstein-based DRO framework.
arXiv Detail & Related papers (2022-03-01T15:36:12Z) - Robust Restless Bandits: Tackling Interval Uncertainty with Deep
Reinforcement Learning [31.515757763077065]
We introduce Robust Restless Bandits, a generalization of restless multi-arm bandits (RMAB)
We develop solutions for a minimax regret objective when transitions are given by interval uncertainties.
We introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs.
arXiv Detail & Related papers (2021-07-04T17:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.