Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy
Learning and Evaluation Method
- URL: http://arxiv.org/abs/2402.11123v1
- Date: Fri, 16 Feb 2024 23:13:05 GMT
- Title: Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy
Learning and Evaluation Method
- Authors: Yong Huang, Charles A. Downs, Amir M. Rahmani
- Abstract summary: Warfarin, an anticoagulant medication, is formulated to prevent and address conditions associated with abnormal blood clotting.
Finding the suitable dosage remains challenging due to individual response variations, and prescribing an incorrect dosage may lead to severe consequences.
We used contextual bandit and reinforcement learning to determine the optimal personalized dosage strategy.
- Score: 2.8806234438838256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Warfarin, an anticoagulant medication, is formulated to prevent and address
conditions associated with abnormal blood clotting, making it one of the most
prescribed drugs globally. However, determining the suitable dosage remains
challenging due to individual response variations, and prescribing an incorrect
dosage may lead to severe consequences. Contextual bandit and reinforcement
learning have shown promise in addressing this issue. Given the wide
availability of observational data and safety concerns of decision-making in
healthcare, we focused on using exclusively observational data from historical
policies as demonstrations to derive new policies; we utilized offline policy
learning and evaluation in a contextual bandit setting to establish the optimal
personalized dosage strategy. Our learned policies surpassed these baseline
approaches without genotype inputs, even when given a suboptimal demonstration,
showcasing promising application potential.
Related papers
- Customize Multi-modal RAI Guardrails with Precedent-based predictions [55.63757336900865]
A multi-modal guardrail must effectively filter image content based on user-defined policies.<n>Existing fine-tuning methods typically condition predictions on pre-defined policies.<n>We propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input.
arXiv Detail & Related papers (2025-07-28T03:45:34Z) - Pragmatic Policy Development via Interpretable Behavior Cloning [6.177449809243359]
We propose deriving treatment policies from the most frequently chosen actions in each patient state, as estimated by an interpretable model of the behavior policy.<n>We demonstrate that policies derived under this framework can outperform current practice, offering interpretable alternatives to those obtained via offline RL.
arXiv Detail & Related papers (2025-07-22T22:34:35Z) - Safe and Interpretable Estimation of Optimal Treatment Regimes [54.257304443780434]
We operationalize a safe and interpretable framework to identify optimal treatment regimes.
Our findings support personalized treatment strategies based on a patient's medical history and pharmacological features.
arXiv Detail & Related papers (2023-10-23T19:59:10Z) - Optimal and Fair Encouragement Policy Evaluation and Learning [11.712023983596914]
We study causal identification and robust estimation of optimal treatment rules, including under potential violations of positivity.
We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
We illustrate the methods in three case studies based on data from reminders of SNAP benefits, randomized encouragement to enroll in insurance, and from pretrial supervised release with electronic monitoring.
arXiv Detail & Related papers (2023-09-12T20:45:30Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top
exploration [53.122045119395594]
We present a novel technique for evaluating vaccine allocation strategies using a multi-armed bandit framework.
$m$-top exploration allows the algorithm to learn $m$ policies for which it expects the highest utility.
We consider the Belgian COVID-19 epidemic using the individual-based model STRIDE, where we learn a set of vaccination policies.
arXiv Detail & Related papers (2023-01-30T12:22:30Z) - Relative Sparsity for Medical Decision Problems [0.0]
It is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care.
We propose a criterion for selecting $lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset.
arXiv Detail & Related papers (2022-11-29T20:00:11Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Interpretable Off-Policy Learning via Hyperbox Search [20.83151214072516]
We propose an algorithm for interpretable off-policy learning via hyperbox search.
Our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible.
We demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret.
arXiv Detail & Related papers (2022-03-04T18:10:24Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Optimizing Medical Treatment for Sepsis in Intensive Care: from
Reinforcement Learning to Pre-Trial Evaluation [2.908482270923597]
Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies.
We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics.
arXiv Detail & Related papers (2020-03-13T20:31:47Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.