Related papers: Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents

URL: http://arxiv.org/abs/2210.11825v1
Date: Fri, 21 Oct 2022 08:57:46 GMT
Title: Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents
Authors: Yael Septon, Tobias Huber, Elisabeth Andr\'e, Ofra Amir
Abstract summary: Methods that help users understand the behavior of such agents can roughly be divided into local explanations and global explanations. We study a novel combination of local and global explanations for reinforcement learning agents.
Score: 3.8520321531809705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and their respective benefits. Our results show significant benefits for both methods. In general, we found that the local reward decomposition was more useful for identifying the agents' priorities. However, when there was only a minor difference between the agents' preferences, then the global information provided by HIGHLIGHTS additionally improved participants' understanding.

Related papers

Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations. RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Causal State Distillation for Explainable Reinforcement Learning [16.998047658978482]
Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be challenging. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD) RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. We present an extension of RD that goes beyond sub-rewards to provide more informative explanations.
arXiv Detail & Related papers (2023-12-30T00:01:22Z)
Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes [9.108253909440489]
We propose COViz'', a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes.
arXiv Detail & Related papers (2023-12-18T11:34:58Z)
Experiential Explanations for Reinforcement Learning [15.80179578318569]
Reinforcement Learning systems can be complex and non-interpretable. We propose a technique, Experiential Explanations, to generate counterfactual explanations.
arXiv Detail & Related papers (2022-10-10T14:27:53Z)
Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions. By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z)
Explaining Reinforcement Learning Policies through Counterfactual Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time. Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
Learning Domain Invariant Representations for Generalizable Person Re-Identification [71.35292121563491]
Generalizable person Re-Identification (ReID) has attracted growing attention in recent computer vision community. We introduce causality into person ReID and propose a novel generalizable framework, named Domain Invariant Representations for generalizable person Re-Identification (DIR-ReID)
arXiv Detail & Related papers (2021-03-29T18:59:48Z)
"I Don't Think So": Disagreement-Based Policy Summaries for Comparing Agents [2.6270468656705765]
We propose a novel method for generating contrastive summaries that highlight the differences between agent's policies. Our results show that the novel disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS.
arXiv Detail & Related papers (2021-02-05T09:09:00Z)
Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential. We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes. We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z)
Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency Maps [4.568911586155097]
We combine global and local explanations for reinforcement learning agents. We augment strategy summaries that extract important trajectories of states from simulations with saliency maps. We find mixed results with respect to augmenting demonstrations with saliency maps.
arXiv Detail & Related papers (2020-05-18T16:44:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.