What Did You Think Would Happen? Explaining Agent Behaviour Through
Intended Outcomes
- URL: http://arxiv.org/abs/2011.05064v1
- Date: Tue, 10 Nov 2020 12:05:08 GMT
- Title: What Did You Think Would Happen? Explaining Agent Behaviour Through
Intended Outcomes
- Authors: Herman Yau, Chris Russell, Simon Hadfield,
- Abstract summary: We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome.
These explanations describe the outcome an agent is trying to achieve by its actions.
We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning.
- Score: 30.056732656973637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel form of explanation for Reinforcement Learning, based
around the notion of intended outcome. These explanations describe the outcome
an agent is trying to achieve by its actions. We provide a simple proof that
general methods for post-hoc explanations of this nature are impossible in
traditional reinforcement learning. Rather, the information needed for the
explanations must be collected in conjunction with training the agent. We
derive approaches designed to extract local explanations based on intention for
several variants of Q-function approximation and prove consistency between the
explanations and the Q-values learned. We demonstrate our method on multiple
reinforcement learning problems, and provide code to help researchers
introspecting their RL environments and algorithms.
Related papers
- Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference.
Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations.
We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z) - GANterfactual-RL: Understanding Reinforcement Learning Agents'
Strategies through Visual Counterfactual Explanations [0.7874708385247353]
We propose a novel but simple method to generate counterfactual explanations for RL agents.
Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics.
arXiv Detail & Related papers (2023-02-24T15:29:43Z) - Complementary Explanations for Effective In-Context Learning [77.83124315634386]
Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts.
This work aims to better understand the mechanisms by which explanations are used for in-context learning.
arXiv Detail & Related papers (2022-11-25T04:40:47Z) - Redefining Counterfactual Explanations for Reinforcement Learning:
Overview, Challenges and Opportunities [2.0341936392563063]
Most explanation methods for AI are focused on developers and expert users.
Counterfactual explanations offer users advice on what can be changed in the input for the output of the black-box model to change.
Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system.
arXiv Detail & Related papers (2022-10-21T09:50:53Z) - Learning to Scaffold: Optimizing Model Explanations for Teaching [74.25464914078826]
We train models on three natural language processing and computer vision tasks.
We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
arXiv Detail & Related papers (2022-04-22T16:43:39Z) - Tell me why! -- Explanations support learning of relational and causal
structure [24.434551113103105]
Explanations play a considerable role in human learning, especially in areas that remain major challenges for AI.
We show that reinforcement learning agents might likewise benefit from explanations.
Our results suggest that learning from explanations is a powerful principle that could offer a promising path towards training more robust and general machine learning systems.
arXiv Detail & Related papers (2021-12-07T15:09:06Z) - Are We On The Same Page? Hierarchical Explanation Generation for
Planning Tasks in Human-Robot Teaming using Reinforcement Learning [0.0]
We argue that the agent-generated explanations should be abstracted to be aligned with the level of details the human teammate desires to maintain the recipient's cognitive load.
We show that hierarchical explanations achieved better task performance and behavior interpretability while reduced cognitive load.
arXiv Detail & Related papers (2020-12-22T02:14:52Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Towards Interpretable Natural Language Understanding with Explanations
as Latent Variables [146.83882632854485]
We develop a framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model.
arXiv Detail & Related papers (2020-10-24T02:05:56Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.