Explaining Reinforcement Learning Agents Through Counterfactual Action
Outcomes
- URL: http://arxiv.org/abs/2312.11118v1
- Date: Mon, 18 Dec 2023 11:34:58 GMT
- Title: Explaining Reinforcement Learning Agents Through Counterfactual Action
Outcomes
- Authors: Yotam Amitai, Yael Septon and Ofra Amir
- Abstract summary: We propose COViz'', a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one.
In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes.
- Score: 9.108253909440489
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Explainable reinforcement learning (XRL) methods aim to help elucidate agent
policies and decision-making processes. The majority of XRL approaches focus on
local explanations, seeking to shed light on the reasons an agent acts the way
it does at a specific world state. While such explanations are both useful and
necessary, they typically do not portray the outcomes of the agent's selected
choice of action. In this work, we propose ``COViz'', a new local explanation
method that visually compares the outcome of an agent's chosen action to a
counterfactual one. In contrast to most local explanations that provide
state-limited observations of the agent's motivation, our method depicts
alternative trajectories the agent could have taken from the given state and
their outcomes. We evaluated the usefulness of COViz in supporting people's
understanding of agents' preferences and compare it with reward decomposition,
a local explanation method that describes an agent's expected utility for
different actions by decomposing it into meaningful reward types. Furthermore,
we examine the complementary benefits of integrating both methods. Our results
show that such integration significantly improved participants' performance.
Related papers
- Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Integrating Policy Summaries with Reward Decomposition for Explaining
Reinforcement Learning Agents [3.8520321531809705]
Methods that help users understand the behavior of such agents can roughly be divided into local explanations and global explanations.
We study a novel combination of local and global explanations for reinforcement learning agents.
arXiv Detail & Related papers (2022-10-21T08:57:46Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks.
We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z) - What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm"
We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z) - "I Don't Think So": Disagreement-Based Policy Summaries for Comparing
Agents [2.6270468656705765]
We propose a novel method for generating contrastive summaries that highlight the differences between agent's policies.
Our results show that the novel disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS.
arXiv Detail & Related papers (2021-02-05T09:09:00Z) - What Did You Think Would Happen? Explaining Agent Behaviour Through
Intended Outcomes [30.056732656973637]
We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome.
These explanations describe the outcome an agent is trying to achieve by its actions.
We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning.
arXiv Detail & Related papers (2020-11-10T12:05:08Z) - Local and Global Explanations of Agent Behavior: Integrating Strategy
Summaries with Saliency Maps [4.568911586155097]
We combine global and local explanations for reinforcement learning agents.
We augment strategy summaries that extract important trajectories of states from simulations with saliency maps.
We find mixed results with respect to augmenting demonstrations with saliency maps.
arXiv Detail & Related papers (2020-05-18T16:44:55Z) - Incentivizing Exploration with Selective Data Disclosure [94.32975679779491]
We propose and design recommendation systems that incentivize efficient exploration.
Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions.
We attain optimal regret rate for exploration using a flexible frequentist behavioral model.
arXiv Detail & Related papers (2018-11-14T19:29:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.