Collapsing Bandits and Their Application to Public Health Interventions
- URL: http://arxiv.org/abs/2007.04432v1
- Date: Sun, 5 Jul 2020 00:33:30 GMT
- Title: Collapsing Bandits and Their Application to Public Health Interventions
- Authors: Aditya Mate, Jackson A. Killian, Haifeng Xu, Andrew Perrault, Milind
Tambe
- Abstract summary: Collpasing Bandits is a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process.
We build on the Whittle index technique for RMABs to derive conditions under which the Collapsing Bandits problem is indexable.
Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques.
- Score: 45.45852113386041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and study Collpasing Bandits, a new restless multi-armed bandit
(RMAB) setting in which each arm follows a binary-state Markovian process with
a special structure: when an arm is played, the state is fully observed, thus
"collapsing" any uncertainty, but when an arm is passive, no observation is
made, thus allowing uncertainty to evolve. The goal is to keep as many arms in
the "good" state as possible by planning a limited budget of actions per round.
Such Collapsing Bandits are natural models for many healthcare domains in which
workers must simultaneously monitor patients and deliver interventions in a way
that maximizes the health of their patient cohort. Our main contributions are
as follows: (i) Building on the Whittle index technique for RMABs, we derive
conditions under which the Collapsing Bandits problem is indexable. Our
derivation hinges on novel conditions that characterize when the optimal
policies may take the form of either "forward" or "reverse" threshold policies.
(ii) We exploit the optimality of threshold policies to build fast algorithms
for computing the Whittle index, including a closed-form. (iii) We evaluate our
algorithm on several data distributions including data from a real-world
healthcare task in which a worker must monitor and deliver interventions to
maximize their patients' adherence to tuberculosis medication. Our algorithm
achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB
techniques while achieving similar performance.
Related papers
- Best-Arm Identification in Unimodal Bandits [24.001611176749158]
We study the fixed-confidence best-arm identification problem in unimodal bandits.
We derive two lower on the stopping time of any bounds.
We show that a linear dependence on the number of arms is unavoidable in the confidence-independent cost.
arXiv Detail & Related papers (2024-11-04T09:05:11Z) - Robust Best-arm Identification in Linear Bandits [25.91361349646875]
We study the robust best-arm identification problem (RBAI) in the case of linear rewards.
We present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards.
Our algorithms prove to be effective in identifying robust dosage values across various age ranges of patients.
arXiv Detail & Related papers (2023-11-08T14:58:11Z) - Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms.
The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space.
It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z) - Limited Resource Allocation in a Non-Markovian World: The Case of
Maternal and Child Healthcare [27.812174610119452]
We consider the problem of scheduling interventions in low resource settings to increase adherence and/or engagement.
Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem.
We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN.
To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to predict future states, and (iii) propose the Time
arXiv Detail & Related papers (2023-05-22T02:26:29Z) - Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients.
We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm)
We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z) - Towards Soft Fairness in Restless Multi-Armed Bandits [8.140037969280716]
Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty.
To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint.
We then provide an approach to enforce the soft fairness constraint in RMABs.
arXiv Detail & Related papers (2022-07-27T07:56:32Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Efficient Algorithms for Finite Horizon and Streaming Restless
Multi-Armed Bandit Problems [30.759279275710078]
We propose a new and scalable approach to computing index-based solutions.
We provide algorithms designed to capture index decay without having to solve the costly finite horizon problem.
Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance.
arXiv Detail & Related papers (2021-03-08T13:10:31Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.