Inverse Contextual Bandits: Learning How Behavior Evolves over Time
- URL: http://arxiv.org/abs/2107.06317v1
- Date: Tue, 13 Jul 2021 18:24:18 GMT
- Title: Inverse Contextual Bandits: Learning How Behavior Evolves over Time
- Authors: Alihan H\"uy\"uk, Daniel Jarrett, Mihaela van der Schaar
- Abstract summary: We seek an approach to policy learning that provides interpretable representations of decision-making.
First, we model the behavior of learning agents in terms of contextual bandits, and formalize the problem of inverse contextual bandits (ICB)
Second, we propose two algorithms to tackle ICB, each making varying degrees of assumptions regarding the agent's learning strategy.
- Score: 89.59391124399927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding an agent's priorities by observing their behavior is critical
for transparency and accountability in decision processes, such as in
healthcare. While conventional approaches to policy learning almost invariably
assume stationarity in behavior, this is hardly true in practice: Medical
practice is constantly evolving, and clinical professionals are constantly
fine-tuning their priorities. We desire an approach to policy learning that
provides (1) interpretable representations of decision-making, accounts for (2)
non-stationarity in behavior, as well as operating in an (3) offline manner.
First, we model the behavior of learning agents in terms of contextual bandits,
and formalize the problem of inverse contextual bandits (ICB). Second, we
propose two algorithms to tackle ICB, each making varying degrees of
assumptions regarding the agent's learning strategy. Finally, through both real
and simulated data for liver transplantations, we illustrate the applicability
and explainability of our method, as well as validating its accuracy.
Related papers
- Explaining by Imitating: Understanding Decisions by Interpretable Policy
Learning [72.80902932543474]
Understanding human behavior from observed data is critical for transparency and accountability in decision-making.
Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging.
We propose a data-driven representation of decision-making behavior that inheres transparency by design, accommodates partial observability, and operates completely offline.
arXiv Detail & Related papers (2023-10-28T13:06:14Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Interpretable Off-Policy Learning via Hyperbox Search [20.83151214072516]
We propose an algorithm for interpretable off-policy learning via hyperbox search.
Our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible.
We demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret.
arXiv Detail & Related papers (2022-03-04T18:10:24Z) - Reinforcement Learning Your Way: Agent Characterization through Policy
Regularization [0.0]
We develop a method to imbue a characteristic behaviour into agents' policies through regularization of their objective functions.
Our method guides the agents' behaviour during learning which results in an intrinsic characterization.
In future work, we intend to employ it to develop agents that optimize individual financial customers' investment portfolios based on their spending personalities.
arXiv Detail & Related papers (2022-01-21T08:18:38Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.