Equitable Restless Multi-Armed Bandits: A General Framework Inspired By
Digital Health
- URL: http://arxiv.org/abs/2308.09726v1
- Date: Thu, 17 Aug 2023 13:00:27 GMT
- Title: Equitable Restless Multi-Armed Bandits: A General Framework Inspired By
Digital Health
- Authors: Jackson A. Killian, Manish Jain, Yugang Jia, Jonathan Amar, Erich
Huang, Milind Tambe
- Abstract summary: Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources.
RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health.
We study equitable objectives for RMABs for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare.
We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes
- Score: 23.762981395335217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restless multi-armed bandits (RMABs) are a popular framework for algorithmic
decision making in sequential settings with limited resources. RMABs are
increasingly being used for sensitive decisions such as in public health,
treatment scheduling, anti-poaching, and -- the motivation for this work --
digital health. For such high stakes settings, decisions must both improve
outcomes and prevent disparities between groups (e.g., ensure health equity).
We study equitable objectives for RMABs (ERMABs) for the first time. We
consider two equity-aligned objectives from the fairness literature, minimax
reward and max Nash welfare. We develop efficient algorithms for solving each
-- a water filling algorithm for the former, and a greedy algorithm with
theoretically motivated nuance to balance disparate group sizes for the latter.
Finally, we demonstrate across three simulation domains, including a new
digital health model, that our approaches can be multiple times more equitable
than the current state of the art without drastic sacrifices to utility. Our
findings underscore our work's urgency as RMABs permeate into systems that
impact human and wildlife outcomes. Code is available at
https://github.com/google-research/socialgood/tree/equitable-rmab
Related papers
- A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health [29.894488663882328]
Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation.
We propose a Decision Language Model (DLM) for RMABs, enabling dynamic fine-tuning of RMAB policies using human-language commands.
arXiv Detail & Related papers (2024-02-22T18:58:27Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Towards Soft Fairness in Restless Multi-Armed Bandits [8.140037969280716]
Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty.
To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint.
We then provide an approach to enforce the soft fairness constraint in RMABs.
arXiv Detail & Related papers (2022-07-27T07:56:32Z) - Efficient Resource Allocation with Fairness Constraints in Restless
Multi-Armed Bandits [8.140037969280716]
Restless Multi-Armed Bandits (RMAB) is an apt model to represent decision-making problems in public health interventions.
In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value.
arXiv Detail & Related papers (2022-06-08T13:28:29Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits [91.8283876874947]
We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm.
We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives.
We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
arXiv Detail & Related papers (2021-10-16T17:52:32Z) - Reinforcement Learning for Finite-Horizon Restless Multi-Armed
Multi-Action Bandits [8.136957953239254]
We study a finite-horizon restless multi-armed bandit problem with multiple actions dubbed R(MA)2B.
The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken.
Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy.
arXiv Detail & Related papers (2021-09-20T21:40:12Z) - Efficient Algorithms for Finite Horizon and Streaming Restless
Multi-Armed Bandit Problems [30.759279275710078]
We propose a new and scalable approach to computing index-based solutions.
We provide algorithms designed to capture index decay without having to solve the costly finite horizon problem.
Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance.
arXiv Detail & Related papers (2021-03-08T13:10:31Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.