Integrating Policy Summaries with Reward Decomposition for Explaining
Reinforcement Learning Agents
- URL: http://arxiv.org/abs/2210.11825v1
- Date: Fri, 21 Oct 2022 08:57:46 GMT
- Title: Integrating Policy Summaries with Reward Decomposition for Explaining
Reinforcement Learning Agents
- Authors: Yael Septon, Tobias Huber, Elisabeth Andr\'e, Ofra Amir
- Abstract summary: Methods that help users understand the behavior of such agents can roughly be divided into local explanations and global explanations.
We study a novel combination of local and global explanations for reinforcement learning agents.
- Score: 3.8520321531809705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining the behavior of reinforcement learning agents operating in
sequential decision-making settings is challenging, as their behavior is
affected by a dynamic environment and delayed rewards. Methods that help users
understand the behavior of such agents can roughly be divided into local
explanations that analyze specific decisions of the agents and global
explanations that convey the general strategy of the agents. In this work, we
study a novel combination of local and global explanations for reinforcement
learning agents. Specifically, we combine reward decomposition, a local
explanation method that exposes which components of the reward function
influenced a specific decision, and HIGHLIGHTS, a global explanation method
that shows a summary of the agent's behavior in decisive states. We conducted
two user studies to evaluate the integration of these explanation methods and
their respective benefits. Our results show significant benefits for both
methods. In general, we found that the local reward decomposition was more
useful for identifying the agents' priorities. However, when there was only a
minor difference between the agents' preferences, then the global information
provided by HIGHLIGHTS additionally improved participants' understanding.
Related papers
- Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Causal State Distillation for Explainable Reinforcement Learning [16.998047658978482]
Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be challenging.
Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD)
RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner.
We present an extension of RD that goes beyond sub-rewards to provide more informative explanations.
arXiv Detail & Related papers (2023-12-30T00:01:22Z) - Explaining Reinforcement Learning Agents Through Counterfactual Action
Outcomes [9.108253909440489]
We propose COViz'', a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one.
In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes.
arXiv Detail & Related papers (2023-12-18T11:34:58Z) - Experiential Explanations for Reinforcement Learning [15.80179578318569]
Reinforcement Learning systems can be complex and non-interpretable.
We propose a technique, Experiential Explanations, to generate counterfactual explanations.
arXiv Detail & Related papers (2022-10-10T14:27:53Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Learning Domain Invariant Representations for Generalizable Person
Re-Identification [71.35292121563491]
Generalizable person Re-Identification (ReID) has attracted growing attention in recent computer vision community.
We introduce causality into person ReID and propose a novel generalizable framework, named Domain Invariant Representations for generalizable person Re-Identification (DIR-ReID)
arXiv Detail & Related papers (2021-03-29T18:59:48Z) - "I Don't Think So": Disagreement-Based Policy Summaries for Comparing
Agents [2.6270468656705765]
We propose a novel method for generating contrastive summaries that highlight the differences between agent's policies.
Our results show that the novel disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS.
arXiv Detail & Related papers (2021-02-05T09:09:00Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z) - Local and Global Explanations of Agent Behavior: Integrating Strategy
Summaries with Saliency Maps [4.568911586155097]
We combine global and local explanations for reinforcement learning agents.
We augment strategy summaries that extract important trajectories of states from simulations with saliency maps.
We find mixed results with respect to augmenting demonstrations with saliency maps.
arXiv Detail & Related papers (2020-05-18T16:44:55Z) - Incentivizing Exploration with Selective Data Disclosure [94.32975679779491]
We propose and design recommendation systems that incentivize efficient exploration.
Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions.
We attain optimal regret rate for exploration using a flexible frequentist behavioral model.
arXiv Detail & Related papers (2018-11-14T19:29:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.