IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health
- URL: http://arxiv.org/abs/2412.08463v1
- Date: Wed, 11 Dec 2024 15:28:04 GMT
- Title: IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health
- Authors: Gauri Jain, Pradeep Varakantham, Haifeng Xu, Aparna Taneja, Prashant Doshi, Milind Tambe,
- Abstract summary: This paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs.
We demonstrate improved outcomes in a maternal and child health telehealth program.
- Score: 52.79219652923714
- License:
- Abstract: Public health practitioners often have the goal of monitoring patients and maximizing patients' time spent in "favorable" or healthy states while being constrained to using limited resources. Restless multi-armed bandits (RMAB) are an effective model to solve this problem as they are helpful to allocate limited resources among many agents under resource constraints, where patients behave differently depending on whether they are intervened on or not. However, RMABs assume the reward function is known. This is unrealistic in many public health settings because patients face unique challenges and it is impossible for a human to know who is most deserving of any intervention at such a large scale. To address this shortcoming, this paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs, and we demonstrate improved outcomes in a maternal and child health telehealth program. First we allow public health experts to specify their goals at an aggregate or population level and propose an algorithm to design expert trajectories at scale based on those goals. Second, our algorithm WHIRL uses gradient updates to optimize the objective, allowing for efficient and accurate learning of RMAB rewards. Third, we compare with existing baselines and outperform those in terms of run-time and accuracy. Finally, we evaluate and show the usefulness of WHIRL on thousands on beneficiaries from a real-world maternal and child health setting in India. We publicly release our code here: https://github.com/Gjain234/WHIRL.
Related papers
- Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach [31.228987526386558]
Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring.
We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit paradigm.
We demonstrate in simulations that our approach outperforms the best baseline by up to a factor of $4$.
arXiv Detail & Related papers (2024-10-10T21:20:07Z) - Improving Health Information Access in the World's Largest Maternal Mobile Health Program via Bandit Algorithms [24.4450506603579]
This paper focuses on Kilkari, the world's largest mHealth program for maternal and child care.
We present a system called CHAHAK that aims to reduce automated dropouts as well as boost engagement with the program.
arXiv Detail & Related papers (2024-05-14T07:21:49Z) - Efficient Public Health Intervention Planning Using Decomposition-Based
Decision-Focused Learning [33.14258196945301]
We show how to exploit the structure of Restless Multi-Armed Bandits (RMABs) to speed up intervention planning.
We use real-world data from an Indian NGO, ARMMAN, to show that our approach is up to two orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2024-03-08T21:31:00Z) - A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health [29.894488663882328]
Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation.
We propose a Decision Language Model (DLM) for RMABs, enabling dynamic fine-tuning of RMAB policies using human-language commands.
arXiv Detail & Related papers (2024-02-22T18:58:27Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources [47.57108369791273]
Scarcity of health care resources could result in the unavoidable consequence of rationing.
There is no universally accepted standard for health care resource allocation protocols.
We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients.
arXiv Detail & Related papers (2023-09-15T17:28:06Z) - Equitable Restless Multi-Armed Bandits: A General Framework Inspired By
Digital Health [23.762981395335217]
Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources.
RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health.
We study equitable objectives for RMABs for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare.
We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes
arXiv Detail & Related papers (2023-08-17T13:00:27Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.