Related papers: Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method

Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method

URL: http://arxiv.org/abs/2402.11123v1
Date: Fri, 16 Feb 2024 23:13:05 GMT
Title: Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method
Authors: Yong Huang, Charles A. Downs, Amir M. Rahmani
Abstract summary: Warfarin, an anticoagulant medication, is formulated to prevent and address conditions associated with abnormal blood clotting. Finding the suitable dosage remains challenging due to individual response variations, and prescribing an incorrect dosage may lead to severe consequences. We used contextual bandit and reinforcement learning to determine the optimal personalized dosage strategy.
Score: 2.8806234438838256
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Warfarin, an anticoagulant medication, is formulated to prevent and address conditions associated with abnormal blood clotting, making it one of the most prescribed drugs globally. However, determining the suitable dosage remains challenging due to individual response variations, and prescribing an incorrect dosage may lead to severe consequences. Contextual bandit and reinforcement learning have shown promise in addressing this issue. Given the wide availability of observational data and safety concerns of decision-making in healthcare, we focused on using exclusively observational data from historical policies as demonstrations to derive new policies; we utilized offline policy learning and evaluation in a contextual bandit setting to establish the optimal personalized dosage strategy. Our learned policies surpassed these baseline approaches without genotype inputs, even when given a suboptimal demonstration, showcasing promising application potential.

Related papers

Customize Multi-modal RAI Guardrails with Precedent-based predictions [55.63757336900865]
A multi-modal guardrail must effectively filter image content based on user-defined policies.<n>Existing fine-tuning methods typically condition predictions on pre-defined policies.<n>We propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input.
arXiv Detail & Related papers (2025-07-28T03:45:34Z)
Pragmatic Policy Development via Interpretable Behavior Cloning [6.177449809243359]
We propose deriving treatment policies from the most frequently chosen actions in each patient state, as estimated by an interpretable model of the behavior policy.<n>We demonstrate that policies derived under this framework can outperform current practice, offering interpretable alternatives to those obtained via offline RL.
arXiv Detail & Related papers (2025-07-22T22:34:35Z)
Safe and Interpretable Estimation of Optimal Treatment Regimes [54.257304443780434]
We operationalize a safe and interpretable framework to identify optimal treatment regimes. Our findings support personalized treatment strategies based on a patient's medical history and pharmacological features.
arXiv Detail & Related papers (2023-10-23T19:59:10Z)
Optimal and Fair Encouragement Policy Evaluation and Learning [11.712023983596914]
We study causal identification and robust estimation of optimal treatment rules, including under potential violations of positivity. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in three case studies based on data from reminders of SNAP benefits, randomized encouragement to enroll in insurance, and from pretrial supervised release with electronic monitoring.
arXiv Detail & Related papers (2023-09-12T20:45:30Z)
Hallucinated Adversarial Control for Conservative Offline Policy Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance. We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics. We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z)
Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration [53.122045119395594]
We present a novel technique for evaluating vaccine allocation strategies using a multi-armed bandit framework. $m$-top exploration allows the algorithm to learn $m$ policies for which it expects the highest utility. We consider the Belgian COVID-19 epidemic using the individual-based model STRIDE, where we learn a set of vaccination policies.
arXiv Detail & Related papers (2023-01-30T12:22:30Z)
Relative Sparsity for Medical Decision Problems [0.0]
It is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care. We propose a criterion for selecting $lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset.
arXiv Detail & Related papers (2022-11-29T20:00:11Z)
Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z)
Interpretable Off-Policy Learning via Hyperbox Search [20.83151214072516]
We propose an algorithm for interpretable off-policy learning via hyperbox search. Our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible. We demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret.
arXiv Detail & Related papers (2022-03-04T18:10:24Z)
Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy. We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z)
Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications. Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z)
Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation [2.908482270923597]
Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies. We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics.
arXiv Detail & Related papers (2020-03-13T20:31:47Z)
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare. We develop an approach that estimates the bounds of a given policy. We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.