A Robust Mixed-Effects Bandit Algorithm for Assessing Mobile Health Interventions
- URL: http://arxiv.org/abs/2312.06403v3
- Date: Fri, 7 Jun 2024 01:18:16 GMT
- Title: A Robust Mixed-Effects Bandit Algorithm for Assessing Mobile Health Interventions
- Authors: Easton K. Huch, Jieru Shi, Madeline R. Abbott, Jessica R. Golbus, Alexander Moreno, Walter H. Dempsey,
- Abstract summary: "DML-TS-NNR" is a contextual bandit algorithm for mobile health interventions.
It addresses challenges like participant heterogeneity, nonstationarity, and nonlinearity in rewards.
We demonstrate the superior performance of the DML-TS-NNR algorithm in a simulation and two off-policy evaluation studies.
- Score: 39.8207428422509
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mobile health leverages personalized, contextually-tailored interventions optimized through bandit and reinforcement learning algorithms. Despite its promise, challenges like participant heterogeneity, nonstationarity, and nonlinearity in rewards hinder algorithm performance. We propose a robust contextual bandit algorithm, termed "DML-TS-NNR", that simultaneously addresses these challenges via (1) modeling the differential reward with user- and time-specific incidental parameters, (2) network cohesion penalties, and (3) debiased machine learning for flexible estimation of baseline rewards. We establish a high-probability regret bound that depends solely on the dimension of the differential reward model. This feature enables us to achieve robust regret bounds even when the baseline reward is highly complex. We demonstrate the superior performance of the DML-TS-NNR algorithm in a simulation and two off-policy evaluation studies.
Related papers
- Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.
Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.
We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z) - Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection [1.0358639819750703]
In unsupervised anomaly detection (UAD) research, it is necessary to develop a computationally efficient and scalable solution.
We revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses.
We propose Feature Attenuation of Defective Representation (FADeR) that only employs two layers which attenuates feature information of anomaly reconstruction.
arXiv Detail & Related papers (2024-07-05T15:44:53Z) - Reward-Punishment Reinforcement Learning with Maximum Entropy [3.123049150077741]
We introduce the soft Deep MaxPain'' (softDMP) algorithm, which integrates the optimization of long-term policy entropy into reward-punishment reinforcement learning objectives.
Our motivation is to facilitate a smoother variation of operators utilized in the updating of action values beyond traditional max'' and min'' operators.
arXiv Detail & Related papers (2024-05-20T05:05:14Z) - POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning [17.644279061872442]
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning.
We propose the Potentially Optimal Joint Actions Weighted Qmix (POWQmix) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses during training.
Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
arXiv Detail & Related papers (2024-05-13T03:27:35Z) - Prior Constraints-based Reward Model Training for Aligning Large Language Models [58.33118716810208]
This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.
PCRM incorporates prior constraints, specifically, length ratio and cosine similarity between outputs of each comparison pair, during reward model training to regulate optimization magnitude and control score margins.
Experimental results demonstrate that PCRM significantly improves alignment performance by effectively constraining reward score scaling.
arXiv Detail & Related papers (2024-04-01T07:49:11Z) - Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning
under Distribution Shifts [11.765000124617186]
We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage optimization problems.
We show that our algorithm is superior to risk-neutral Soft Actor-Critic as well as to two benchmark approaches for robust deep reinforcement learning.
arXiv Detail & Related papers (2024-02-15T14:55:38Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Deep Learning Meets Adaptive Filtering: A Stein's Unbiased Risk
Estimator Approach [13.887632153924512]
We introduce task-based deep learning frameworks, denoted as Deep RLS and Deep EASI.
These architectures transform the iterations of the original algorithms into layers of a deep neural network, enabling efficient source signal estimation.
To further enhance performance, we propose training these deep unrolled networks utilizing a surrogate loss function grounded on Stein's unbiased risk estimator (SURE)
arXiv Detail & Related papers (2023-07-31T14:26:41Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Evolving Constrained Reinforcement Learning Policy [5.4444944707433525]
We propose a novel evolutionary constrained reinforcement learning algorithm, which adaptively balances the reward and constraint violation with ranking.
Experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2023-04-19T03:54:31Z) - Anti-Exploration by Random Network Distillation [63.04360288089277]
We show that a naive choice of conditioning for the Random Network Distillation (RND) is not discriminative enough to be used as an uncertainty estimator.
We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM)
We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.
arXiv Detail & Related papers (2023-01-31T13:18:33Z) - Risk-Aware Linear Bandits: Theory and Applications in Smart Order
Routing [10.69955834942979]
We consider risk-aware bandits optimization with applications in smart order routing (SOR)
Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm.
arXiv Detail & Related papers (2022-08-04T00:21:10Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - On Efficient Uncertainty Estimation for Resource-Constrained Mobile
Applications [0.0]
Predictive uncertainty supplements model predictions and enables improved functionality of downstream tasks.
We tackle this problem by building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework.
We conduct experiments on (1) a multi-class classification task using the CIFAR10 dataset, and (2) a more complex human body segmentation task.
arXiv Detail & Related papers (2021-11-11T22:24:15Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.