Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with
Application to Maternal and Child Health
- URL: http://arxiv.org/abs/2202.00916v4
- Date: Sun, 13 Aug 2023 05:44:37 GMT
- Title: Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with
Application to Maternal and Child Health
- Authors: Kai Wang, Shresth Verma, Aditya Mate, Sanket Shah, Aparna Taneja, Neha
Madhiwalla, Aparna Hegde, Milind Tambe
- Abstract summary: This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features.
The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions.
To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality.
- Score: 36.442133189056136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies restless multi-armed bandit (RMAB) problems with unknown
arm transition dynamics but with known correlated arm features. The goal is to
learn a model to predict transition dynamics given features, where the Whittle
index policy solves the RMAB problems using predicted transitions. However,
prior works often learn the model by maximizing the predictive accuracy instead
of final RMAB solution quality, causing a mismatch between training and
evaluation objectives. To address this shortcoming, we propose a novel approach
for decision-focused learning in RMAB that directly trains the predictive model
to maximize the Whittle index solution quality. We present three key
contributions: (i) we establish differentiability of the Whittle index policy
to support decision-focused learning; (ii) we significantly improve the
scalability of decision-focused learning approaches in sequential problems,
specifically RMAB problems; (iii) we apply our algorithm to a previously
collected dataset of maternal and child health to demonstrate its performance.
Indeed, our algorithm is the first for decision-focused learning in RMAB that
scales to real-world problem sizes.
Related papers
- Decision-focused Graph Neural Networks for Combinatorial Optimization [62.34623670845006]
An emerging strategy to tackle optimization problems involves the adoption of graph neural networks (GNNs) as an alternative to traditional algorithms.
Despite the growing popularity of GNNs and traditional algorithm solvers in the realm of CO, there is limited research on their integrated use and the correlation between them within an end-to-end framework.
We introduce a decision-focused framework that utilizes GNNs to address CO problems with auxiliary support.
arXiv Detail & Related papers (2024-06-05T22:52:27Z) - Stochastic Methods for AUC Optimization subject to AUC-based Fairness
Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints.
We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints.
We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z) - Optimistic Whittle Index Policy: Online Learning for Restless Bandits [31.312043984489666]
We propose the first online learning algorithm based on the Whittle index policy to learn transition dynamics.
Our algorithm, UCWhittle, achieves sublinear $O(sqrtT log T)$ frequentist regret to solve RMABs with unknown transitions.
arXiv Detail & Related papers (2022-05-30T18:32:20Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Robust Restless Bandits: Tackling Interval Uncertainty with Deep
Reinforcement Learning [31.515757763077065]
We introduce Robust Restless Bandits, a generalization of restless multi-arm bandits (RMAB)
We develop solutions for a minimax regret objective when transitions are given by interval uncertainties.
We introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs.
arXiv Detail & Related papers (2021-07-04T17:21:26Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch [60.23815709215807]
We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner.
We propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch.
arXiv Detail & Related papers (2020-07-02T14:57:13Z) - Model-based Multi-Agent Reinforcement Learning with Cooperative
Prioritized Sweeping [4.5497948012757865]
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping.
The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function.
Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.
arXiv Detail & Related papers (2020-01-15T19:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.