Related papers: Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes

Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes

URL: http://arxiv.org/abs/2312.11118v1
Date: Mon, 18 Dec 2023 11:34:58 GMT
Title: Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes
Authors: Yotam Amitai, Yael Septon and Ofra Amir
Abstract summary: We propose COViz'', a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes.
Score: 9.108253909440489
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Explainable reinforcement learning (XRL) methods aim to help elucidate agent policies and decision-making processes. The majority of XRL approaches focus on local explanations, seeking to shed light on the reasons an agent acts the way it does at a specific world state. While such explanations are both useful and necessary, they typically do not portray the outcomes of the agent's selected choice of action. In this work, we propose ``COViz'', a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes. We evaluated the usefulness of COViz in supporting people's understanding of agents' preferences and compare it with reward decomposition, a local explanation method that describes an agent's expected utility for different actions by decomposing it into meaningful reward types. Furthermore, we examine the complementary benefits of integrating both methods. Our results show that such integration significantly improved participants' performance.

Related papers

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents [3.8520321531809705]
Methods that help users understand the behavior of such agents can roughly be divided into local explanations and global explanations. We study a novel combination of local and global explanations for reinforcement learning agents.
arXiv Detail & Related papers (2022-10-21T08:57:46Z)
Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions. By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z)
Explaining Reinforcement Learning Policies through Counterfactual Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time. Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)
"I Don't Think So": Disagreement-Based Policy Summaries for Comparing Agents [2.6270468656705765]
We propose a novel method for generating contrastive summaries that highlight the differences between agent's policies. Our results show that the novel disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS.
arXiv Detail & Related papers (2021-02-05T09:09:00Z)
What Did You Think Would Happen? Explaining Agent Behaviour Through Intended Outcomes [30.056732656973637]
We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome. These explanations describe the outcome an agent is trying to achieve by its actions. We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning.
arXiv Detail & Related papers (2020-11-10T12:05:08Z)
Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency Maps [4.568911586155097]
We combine global and local explanations for reinforcement learning agents. We augment strategy summaries that extract important trajectories of states from simulations with saliency maps. We find mixed results with respect to augmenting demonstrations with saliency maps.
arXiv Detail & Related papers (2020-05-18T16:44:55Z)
Incentivizing Exploration with Selective Data Disclosure [70.11902902106014]
We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. We attain optimal regret rate for exploration using a flexible frequentist behavioral model.
arXiv Detail & Related papers (2018-11-14T19:29:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.