Equitable Restless Multi-Armed Bandits: A General Framework Inspired By
Digital Health
- URL: http://arxiv.org/abs/2308.09726v1
- Date: Thu, 17 Aug 2023 13:00:27 GMT
- Title: Equitable Restless Multi-Armed Bandits: A General Framework Inspired By
Digital Health
- Authors: Jackson A. Killian, Manish Jain, Yugang Jia, Jonathan Amar, Erich
Huang, Milind Tambe
- Abstract summary: Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources.
RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health.
We study equitable objectives for RMABs for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare.
We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes
- Score: 23.762981395335217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restless multi-armed bandits (RMABs) are a popular framework for algorithmic
decision making in sequential settings with limited resources. RMABs are
increasingly being used for sensitive decisions such as in public health,
treatment scheduling, anti-poaching, and -- the motivation for this work --
digital health. For such high stakes settings, decisions must both improve
outcomes and prevent disparities between groups (e.g., ensure health equity).
We study equitable objectives for RMABs (ERMABs) for the first time. We
consider two equity-aligned objectives from the fairness literature, minimax
reward and max Nash welfare. We develop efficient algorithms for solving each
-- a water filling algorithm for the former, and a greedy algorithm with
theoretically motivated nuance to balance disparate group sizes for the latter.
Finally, we demonstrate across three simulation domains, including a new
digital health model, that our approaches can be multiple times more equitable
than the current state of the art without drastic sacrifices to utility. Our
findings underscore our work's urgency as RMABs permeate into systems that
impact human and wildlife outcomes. Code is available at
https://github.com/google-research/socialgood/tree/equitable-rmab
Related papers
- Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.
The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.
We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health [52.79219652923714]
This paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs.
We demonstrate improved outcomes in a maternal and child health telehealth program.
arXiv Detail & Related papers (2024-12-11T15:28:04Z) - A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health [29.894488663882328]
Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation.
We propose a Decision Language Model (DLM) for RMABs, enabling dynamic fine-tuning of RMAB policies using human-language commands.
arXiv Detail & Related papers (2024-02-22T18:58:27Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Towards Soft Fairness in Restless Multi-Armed Bandits [8.140037969280716]
Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty.
To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint.
We then provide an approach to enforce the soft fairness constraint in RMABs.
arXiv Detail & Related papers (2022-07-27T07:56:32Z) - Efficient Resource Allocation with Fairness Constraints in Restless
Multi-Armed Bandits [8.140037969280716]
Restless Multi-Armed Bandits (RMAB) is an apt model to represent decision-making problems in public health interventions.
In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value.
arXiv Detail & Related papers (2022-06-08T13:28:29Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Efficient Algorithms for Finite Horizon and Streaming Restless
Multi-Armed Bandit Problems [30.759279275710078]
We propose a new and scalable approach to computing index-based solutions.
We provide algorithms designed to capture index decay without having to solve the costly finite horizon problem.
Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance.
arXiv Detail & Related papers (2021-03-08T13:10:31Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.