Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach
- URL: http://arxiv.org/abs/2410.08377v1
- Date: Thu, 10 Oct 2024 21:20:07 GMT
- Title: Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach
- Authors: Niclas Boehmer, Yunfan Zhao, Guojun Xiong, Paula Rodriguez-Diaz, Paola Del Cueto Cibrian, Joseph Ngonzi, Adeline Boatin, Milind Tambe,
- Abstract summary: Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring.
We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit paradigm.
We demonstrate in simulations that our approach outperforms the best baseline by up to a factor of $4$.
- Score: 31.228987526386558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maternal mortality remains a significant global public health challenge. One promising approach to reducing maternal deaths occurring during facility-based childbirth is through early warning systems, which require the consistent monitoring of mothers' vital signs after giving birth. Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring, but their scarcity raises the critical question of how to allocate them most effectively. We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit (RMAB) paradigm. In doing so, we identify and address novel, previously unstudied constraints unique to this domain, which render previous approaches for RMABs unsuitable and significantly increase the complexity of the learning and planning problem. To overcome these challenges, we adopt the popular Proximal Policy Optimization (PPO) algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network. We demonstrate in simulations that our approach outperforms the best heuristic baseline by up to a factor of $4$.
Related papers
- Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program [36.10003434625494]
Mobile health (mHealth) programs face a critical challenge in optimizing the timing of automated health information calls to beneficiaries.
We propose a principled approach using Thompson Sampling for this collaborative bandit problem.
We demonstrate significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program.
arXiv Detail & Related papers (2024-10-28T18:08:18Z) - Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies.
We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z) - Latent State Marginalization as a Low-cost Approach for Improving
Exploration [79.12247903178934]
We propose the adoption of latent variable policies within the MaxEnt framework.
We show that latent variable policies naturally emerges under the use of world models with a latent belief state.
We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.
arXiv Detail & Related papers (2022-10-03T15:09:12Z) - Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with
Application to Maternal and Child Health [36.442133189056136]
This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features.
The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions.
To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality.
arXiv Detail & Related papers (2022-02-02T08:36:10Z) - Contingency-Aware Influence Maximization: A Reinforcement Learning
Approach [52.109536198330126]
influence (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence.
In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM.
Despite the initial success, a major practical obstacle in promoting the solutions to more communities is the tremendous runtime of the greedy algorithms.
arXiv Detail & Related papers (2021-06-13T16:42:22Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in
Application to Preventive Healthcare [39.41918282603752]
We propose a Whittle index based Q-Learning mechanism for restless multi-armed bandit (RMAB) problems.
Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.
arXiv Detail & Related papers (2021-05-17T15:44:55Z) - Efficient Algorithms for Finite Horizon and Streaming Restless
Multi-Armed Bandit Problems [30.759279275710078]
We propose a new and scalable approach to computing index-based solutions.
We provide algorithms designed to capture index decay without having to solve the costly finite horizon problem.
Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance.
arXiv Detail & Related papers (2021-03-08T13:10:31Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.