Related papers: Interpretable Off-Policy Learning via Hyperbox Search

Interpretable Off-Policy Learning via Hyperbox Search

URL: http://arxiv.org/abs/2203.02473v2
Date: Mon, 26 Jun 2023 13:04:47 GMT
Title: Interpretable Off-Policy Learning via Hyperbox Search
Authors: Daniel Tschernutter, Tobias Hatt, Stefan Feuerriegel
Abstract summary: We propose an algorithm for interpretable off-policy learning via hyperbox search. Our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible. We demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret.
Score: 20.83151214072516
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized treatment decisions have become an integral part of modern medicine. Thereby, the aim is to make treatment decisions based on individual patient characteristics. Numerous methods have been developed for learning such policies from observational data that achieve the best outcome across a certain policy class. Yet these methods are rarely interpretable. However, interpretability is often a prerequisite for policy learning in clinical practice. In this paper, we propose an algorithm for interpretable off-policy learning via hyperbox search. In particular, our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible. We prove a universal approximation theorem that shows that our policy class is flexible enough to approximate any measurable function arbitrarily well. For optimization, we develop a tailored column generation procedure within a branch-and-bound framework. Using a simulation study, we demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret. Using real-word clinical data, we perform a user study with actual clinical experts, who rate our policies as highly interpretable.

Related papers

Pragmatic Policy Development via Interpretable Behavior Cloning [6.177449809243359]
We propose deriving treatment policies from the most frequently chosen actions in each patient state, as estimated by an interpretable model of the behavior policy.<n>We demonstrate that policies derived under this framework can outperform current practice, offering interpretable alternatives to those obtained via offline RL.
arXiv Detail & Related papers (2025-07-22T22:34:35Z)
Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs [8.851129384632994]
We tackle the problem of empirically evaluating policies interpretability without humans. Despite this lack of clear definition, researchers agree on the notions of ''simulatability'' This new methodology relies on proxies for simulatability that we use to conduct a large-scale empirical evaluation of policy interpretability.
arXiv Detail & Related papers (2025-03-11T11:34:06Z)
Deep Causal Behavioral Policy Learning: Applications to Healthcare [0.0]
We present a deep learning-based approach to studying dynamic clinical behavioral regimes in diverse non-randomized healthcare settings. Our proposed methodology uses deep learning algorithms to learn the distribution of high-dimensional clinical action paths. We propose a novel interpretation of a behavioral policy learned using the LCBM: that it is an efficient encoding of complex, often implicit, knowledge used to treat a patient.
arXiv Detail & Related papers (2025-03-05T18:24:58Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z)
Quasi-optimal Reinforcement Learning with Continuous Actions [8.17049210746654]
We develop a novel emphquasi-optimal learning algorithm, which can be easily optimized in off-policy settings. We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset.
arXiv Detail & Related papers (2023-01-21T11:30:13Z)
Scheduling with Predictions [0.0]
Modern learning techniques have made it possible to detect abnormalities in medical images within minutes. Machine-assisted diagnoses cannot yet reliably replace human reviews of images by a radiologist. We study this scenario by formulating it as a learning-augmented online scheduling problem.
arXiv Detail & Related papers (2022-12-20T17:10:06Z)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
Inverse Contextual Bandits: Learning How Behavior Evolves over Time [89.59391124399927]
We seek an approach to policy learning that provides interpretable representations of decision-making. First, we model the behavior of learning agents in terms of contextual bandits, and formalize the problem of inverse contextual bandits (ICB) Second, we propose two algorithms to tackle ICB, each making varying degrees of assumptions regarding the agent's learning strategy.
arXiv Detail & Related papers (2021-07-13T18:24:18Z)
Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)
Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications. Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.