Did we personalize? Assessing personalization by an online reinforcement
learning algorithm using resampling
- URL: http://arxiv.org/abs/2304.05365v6
- Date: Mon, 7 Aug 2023 15:39:37 GMT
- Title: Did we personalize? Assessing personalization by an online reinforcement
learning algorithm using resampling
- Authors: Susobhan Ghosh, Raphael Kim, Prasidh Chhabria, Raaz Dwivedi, Predrag
Klasnja, Peng Liao, Kelly Zhang, Susan Murphy
- Abstract summary: Reinforcement learning (RL) can be used to personalize sequences of treatments in digital health to support users in adopting healthier behaviors.
Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses.
We assess whether the RL algorithm should be included in an optimized'' intervention for real-world deployment.
- Score: 9.745543921550748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing interest in using reinforcement learning (RL) to
personalize sequences of treatments in digital health to support users in
adopting healthier behaviors. Such sequential decision-making problems involve
decisions about when to treat and how to treat based on the user's context
(e.g., prior activity level, location, etc.). Online RL is a promising
data-driven approach for this problem as it learns based on each user's
historical responses and uses that knowledge to personalize these decisions.
However, to decide whether the RL algorithm should be included in an
``optimized'' intervention for real-world deployment, we must assess the data
evidence indicating that the RL algorithm is actually personalizing the
treatments to its users. Due to the stochasticity in the RL algorithm, one may
get a false impression that it is learning in certain states and using this
learning to provide specific treatments. We use a working definition of
personalization and introduce a resampling-based methodology for investigating
whether the personalization exhibited by the RL algorithm is an artifact of the
RL algorithm stochasticity. We illustrate our methodology with a case study by
analyzing the data from a physical activity clinical trial called HeartSteps,
which included the use of an online RL algorithm. We demonstrate how our
approach enhances data-driven truth-in-advertising of algorithm personalization
both across all users as well as within specific users in the study.
Related papers
- Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials [20.944037982124037]
This paper proposes algorithm fidelity as a critical requirement for deploying online RL algorithms in clinical trials.
We present a framework for pre-deployment planning and real-time monitoring to help algorithm developers and clinical researchers ensure algorithm fidelity.
arXiv Detail & Related papers (2024-02-26T20:19:14Z) - Recommending the optimal policy by learning to act from temporal data [2.554326189662943]
This paper proposes an AI based approach that learns, by means of Reinforcement (RL)
The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches.
The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.
arXiv Detail & Related papers (2023-03-16T10:30:36Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Bridging the Gap Between Offline and Online Reinforcement Learning
Evaluation Methodologies [6.303272140868826]
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces.
Current deep RL algorithms require a tremendous amount of environment interactions for learning.
offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data.
arXiv Detail & Related papers (2022-12-15T20:36:10Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - Designing Reinforcement Learning Algorithms for Digital Interventions:
Pre-implementation Guidelines [24.283342018185028]
Online reinforcement learning algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education.
Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints.
We extend the PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning.
arXiv Detail & Related papers (2022-06-08T15:05:28Z) - When Should We Prefer Offline Reinforcement Learning Over Behavioral
Cloning? [86.43517734716606]
offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction.
behavioral cloning (BC) algorithms mimic a subset of the dataset via supervised learning.
We show that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expert data.
arXiv Detail & Related papers (2022-04-12T08:25:34Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Sample-Efficient Reinforcement Learning via Counterfactual-Based Data
Augmentation [15.451690870640295]
In some scenarios such as healthcare, usually only few records are available for each patient, impeding the application of currentReinforcement learning algorithms.
We propose a data-efficient RL algorithm that exploits structural causal models (SCMs) to model the state dynamics.
We show that counterfactual outcomes are identifiable under mild conditions and that Q- learning on the counterfactual-based augmented data set converges to the optimal value function.
arXiv Detail & Related papers (2020-12-16T17:21:13Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.