Inverse Decision Modeling: Learning Interpretable Representations of
Behavior
- URL: http://arxiv.org/abs/2310.18591v1
- Date: Sat, 28 Oct 2023 05:05:01 GMT
- Title: Inverse Decision Modeling: Learning Interpretable Representations of
Behavior
- Authors: Daniel Jarrett, Alihan H\"uy\"uk, Mihaela van der Schaar
- Abstract summary: We develop an expressive, unifying perspective on inverse decision modeling.
We use this to formalize the inverse problem (as a descriptive model)
We illustrate how this structure enables learning (interpretable) representations of (bounded) rationality.
- Score: 72.80902932543474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision analysis deals with modeling and enhancing decision processes. A
principal challenge in improving behavior is in obtaining a transparent
description of existing behavior in the first place. In this paper, we develop
an expressive, unifying perspective on inverse decision modeling: a framework
for learning parameterized representations of sequential decision behavior.
First, we formalize the forward problem (as a normative standard), subsuming
common classes of control behavior. Second, we use this to formalize the
inverse problem (as a descriptive model), generalizing existing work on
imitation/reward learning -- while opening up a much broader class of research
problems in behavior representation. Finally, we instantiate this approach with
an example (inverse bounded rational control), illustrating how this structure
enables learning (interpretable) representations of (bounded) rationality --
while naturally capturing intuitive notions of suboptimal actions, biased
beliefs, and imperfect knowledge of environments.
Related papers
- Causal Abstraction in Model Interpretability: A Compact Survey [5.963324728136442]
causal abstraction provides a principled approach to understanding and explaining the causal mechanisms underlying model behavior.
This survey paper delves into the realm of causal abstraction, examining its theoretical foundations, practical applications, and implications for the field of model interpretability.
arXiv Detail & Related papers (2024-10-26T12:24:28Z) - Generally-Occurring Model Change for Robust Counterfactual Explanations [1.3121410433987561]
We study the robustness of counterfactual explanation generation algorithms to model changes.
In this paper, we first generalize the concept of Naturally-Occurring Model Change.
We also propose a more general concept of model parameter changes, Generally-Occurring Model Change.
arXiv Detail & Related papers (2024-07-16T06:44:00Z) - Representation Surgery: Theory and Practice of Affine Steering [72.61363182652853]
Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text.
One natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model's representations.
This paper investigates the formal and empirical properties of steering functions.
arXiv Detail & Related papers (2024-02-15T00:20:30Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Towards a Grounded Theory of Causation for Embodied AI [12.259552039796027]
Existing frameworks give no indication as to which behaviour policies or physical transformations of state space shall count as interventions.
The framework sketched in this paper describes actions as transformations of state space, for instance induced by an agent running a policy.
This makes it possible to describe in a uniform way both transformations of the micro-state space and abstract models thereof.
arXiv Detail & Related papers (2022-06-28T12:56:43Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Towards Interpretable Reasoning over Paragraph Effects in Situation [126.65672196760345]
We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect.
We propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules.
In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model.
arXiv Detail & Related papers (2020-10-03T04:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.