Model Selection for Offline Reinforcement Learning: Practical
Considerations for Healthcare Settings
- URL: http://arxiv.org/abs/2107.11003v1
- Date: Fri, 23 Jul 2021 02:41:51 GMT
- Title: Model Selection for Offline Reinforcement Learning: Practical
Considerations for Healthcare Settings
- Authors: Shengpu Tang, Jenna Wiens
- Abstract summary: Reinforcement learning can be used to learn treatment policies and aid decision making in healthcare.
A standard validation pipeline for model selection requires running a learned policy in the actual environment.
Our work serves as a practical guide for offline RL model selection and can help RL practitioners select policies using real-world datasets.
- Score: 13.376364233897528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) can be used to learn treatment policies and aid
decision making in healthcare. However, given the need for generalization over
complex state/action spaces, the incorporation of function approximators (e.g.,
deep neural networks) requires model selection to reduce overfitting and
improve policy performance at deployment. Yet a standard validation pipeline
for model selection requires running a learned policy in the actual
environment, which is often infeasible in a healthcare setting. In this work,
we investigate a model selection pipeline for offline RL that relies on
off-policy evaluation (OPE) as a proxy for validation performance. We present
an in-depth analysis of popular OPE methods, highlighting the additional
hyperparameters and computational requirements (fitting/inference of auxiliary
models) when used to rank a set of candidate policies. We compare the utility
of different OPE methods as part of the model selection pipeline in the context
of learning to treat patients with sepsis. Among all the OPE methods we
considered, fitted Q evaluation (FQE) consistently leads to the best validation
ranking, but at a high computational cost. To balance this trade-off between
accuracy of ranking and computational efficiency, we propose a simple two-stage
approach to accelerate model selection by avoiding potentially unnecessary
computation. Our work serves as a practical guide for offline RL model
selection and can help RL practitioners select policies using real-world
datasets. To facilitate reproducibility and future extensions, the code
accompanying this paper is available online at
https://github.com/MLD3/OfflineRL_ModelSelection.
Related papers
- Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning [5.877778007271621]
We introduce a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble.
Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs.
We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data.
arXiv Detail & Related papers (2024-05-30T21:21:33Z) - Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models.
There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z) - Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
Pessimism [91.52263068880484]
We study offline Reinforcement Learning with Human Feedback (RLHF)
We aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.
RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift.
arXiv Detail & Related papers (2023-05-29T01:18:39Z) - Deep Offline Reinforcement Learning for Real-world Treatment
Optimization Applications [3.770564448216192]
We introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training.
We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization.
Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes.
arXiv Detail & Related papers (2023-02-15T09:30:57Z) - Oracle Inequalities for Model Selection in Offline Reinforcement
Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation.
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors.
We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z) - Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
Data [28.846826115837825]
offline reinforcement learning can be used to improve future performance by leveraging historical data.
We introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy.
We show it can have substantial impacts when the dataset is small.
arXiv Detail & Related papers (2022-10-16T21:24:53Z) - Testing Stationarity and Change Point Detection in Reinforcement
Learning [10.343546104340962]
We develop a consistent procedure to test the nonstationarity of the optimal Q-function based on pre-collected historical data.
We further develop a sequential change point detection method that can be naturally coupled with existing state-of-the-art RL methods for policy optimization in nonstationary environments.
arXiv Detail & Related papers (2022-03-03T13:30:28Z) - An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment.
We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - B-Pref: Benchmarking Preference-Based Reinforcement Learning [84.41494283081326]
We introduce B-Pref, a benchmark specially designed for preference-based RL.
A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly.
B-Pref alleviates this by simulating teachers with a wide array of irrationalities.
arXiv Detail & Related papers (2021-11-04T17:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.