Related papers: Collapsing Bandits and Their Application to Public Health Interventions

Collapsing Bandits and Their Application to Public Health Interventions

URL: http://arxiv.org/abs/2007.04432v1
Date: Sun, 5 Jul 2020 00:33:30 GMT
Title: Collapsing Bandits and Their Application to Public Health Interventions
Authors: Aditya Mate, Jackson A. Killian, Haifeng Xu, Andrew Perrault, Milind Tambe
Abstract summary: Collpasing Bandits is a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process. We build on the Whittle index technique for RMABs to derive conditions under which the Collapsing Bandits problem is indexable. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques.
Score: 45.45852113386041
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either "forward" or "reverse" threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients' adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques while achieving similar performance.

Related papers

Best-Arm Identification in Unimodal Bandits [24.001611176749158]
We study the fixed-confidence best-arm identification problem in unimodal bandits. We derive two lower on the stopping time of any bounds. We show that a linear dependence on the number of arms is unavoidable in the confidence-independent cost.
arXiv Detail & Related papers (2024-11-04T09:05:11Z)
Robust Best-arm Identification in Linear Bandits [25.91361349646875]
We study the robust best-arm identification problem (RBAI) in the case of linear rewards. We present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards. Our algorithms prove to be effective in identifying robust dosage values across various age ranges of patients.
arXiv Detail & Related papers (2023-11-08T14:58:11Z)
Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms. The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space. It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z)
Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare [27.812174610119452]
We consider the problem of scheduling interventions in low resource settings to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to predict future states, and (iii) propose the Time
arXiv Detail & Related papers (2023-05-22T02:26:29Z)
Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients. We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm) We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z)
Towards Soft Fairness in Restless Multi-Armed Bandits [8.140037969280716]
Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty. To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint. We then provide an approach to enforce the soft fairness constraint in RMABs.
arXiv Detail & Related papers (2022-07-27T07:56:32Z)
Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment. A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability. We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z)
Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems [30.759279275710078]
We propose a new and scalable approach to computing index-based solutions. We provide algorithms designed to capture index decay without having to solve the costly finite horizon problem. Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance.
arXiv Detail & Related papers (2021-03-08T13:10:31Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.