Interpretable Off-Policy Learning via Hyperbox Search
- URL: http://arxiv.org/abs/2203.02473v2
- Date: Mon, 26 Jun 2023 13:04:47 GMT
- Title: Interpretable Off-Policy Learning via Hyperbox Search
- Authors: Daniel Tschernutter, Tobias Hatt, Stefan Feuerriegel
- Abstract summary: We propose an algorithm for interpretable off-policy learning via hyperbox search.
Our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible.
We demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret.
- Score: 20.83151214072516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized treatment decisions have become an integral part of modern
medicine. Thereby, the aim is to make treatment decisions based on individual
patient characteristics. Numerous methods have been developed for learning such
policies from observational data that achieve the best outcome across a certain
policy class. Yet these methods are rarely interpretable. However,
interpretability is often a prerequisite for policy learning in clinical
practice. In this paper, we propose an algorithm for interpretable off-policy
learning via hyperbox search. In particular, our policies can be represented in
disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible. We prove
a universal approximation theorem that shows that our policy class is flexible
enough to approximate any measurable function arbitrarily well. For
optimization, we develop a tailored column generation procedure within a
branch-and-bound framework. Using a simulation study, we demonstrate that our
algorithm outperforms state-of-the-art methods from interpretable off-policy
learning in terms of regret. Using real-word clinical data, we perform a user
study with actual clinical experts, who rate our policies as highly
interpretable.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - Quasi-optimal Reinforcement Learning with Continuous Actions [8.17049210746654]
We develop a novel emphquasi-optimal learning algorithm, which can be easily optimized in off-policy settings.
We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset.
arXiv Detail & Related papers (2023-01-21T11:30:13Z) - Scheduling with Predictions [0.0]
Modern learning techniques have made it possible to detect abnormalities in medical images within minutes.
Machine-assisted diagnoses cannot yet reliably replace human reviews of images by a radiologist.
We study this scenario by formulating it as a learning-augmented online scheduling problem.
arXiv Detail & Related papers (2022-12-20T17:10:06Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Inverse Contextual Bandits: Learning How Behavior Evolves over Time [89.59391124399927]
We seek an approach to policy learning that provides interpretable representations of decision-making.
First, we model the behavior of learning agents in terms of contextual bandits, and formalize the problem of inverse contextual bandits (ICB)
Second, we propose two algorithms to tackle ICB, each making varying degrees of assumptions regarding the agent's learning strategy.
arXiv Detail & Related papers (2021-07-13T18:24:18Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.