Counterfactual State Explanations for Reinforcement Learning Agents via
Generative Deep Learning
- URL: http://arxiv.org/abs/2101.12446v1
- Date: Fri, 29 Jan 2021 07:43:41 GMT
- Title: Counterfactual State Explanations for Reinforcement Learning Agents via
Generative Deep Learning
- Authors: Matthew L. Olson, Roli Khanna, Lawrence Neal, Fuxin Li, Weng-Keen Wong
- Abstract summary: We focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari.
We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning.
Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent.
- Score: 27.67522513615264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual explanations, which deal with "why not?" scenarios, can
provide insightful explanations to an AI agent's behavior. In this work, we
focus on generating counterfactual explanations for deep reinforcement learning
(RL) agents which operate in visual input environments like Atari. We introduce
counterfactual state explanations, a novel example-based approach to
counterfactual explanations based on generative deep learning. Specifically, a
counterfactual state illustrates what minimal change is needed to an Atari game
image such that the agent chooses a different action. We also evaluate the
effectiveness of counterfactual states on human participants who are not
machine learning experts. Our first user study investigates if humans can
discern if the counterfactual state explanations are produced by the actual
game or produced by a generative deep learning approach. Our second user study
investigates if counterfactual state explanations can help non-expert
participants identify a flawed agent; we compare against a baseline approach
based on a nearest neighbor explanation which uses images from the actual game.
Our results indicate that counterfactual state explanations have sufficient
fidelity to the actual game images to enable non-experts to more effectively
identify a flawed RL agent compared to the nearest neighbor baseline and to
having no explanation at all.
Related papers
- GANterfactual-RL: Understanding Reinforcement Learning Agents'
Strategies through Visual Counterfactual Explanations [0.7874708385247353]
We propose a novel but simple method to generate counterfactual explanations for RL agents.
Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics.
arXiv Detail & Related papers (2023-02-24T15:29:43Z) - Experiential Explanations for Reinforcement Learning [15.80179578318569]
Reinforcement Learning systems can be complex and non-interpretable.
We propose a technique, Experiential Explanations, to generate counterfactual explanations.
arXiv Detail & Related papers (2022-10-10T14:27:53Z) - Learning to Scaffold: Optimizing Model Explanations for Teaching [74.25464914078826]
We train models on three natural language processing and computer vision tasks.
We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
arXiv Detail & Related papers (2022-04-22T16:43:39Z) - Visual Abductive Reasoning [85.17040703205608]
Abductive reasoning seeks the likeliest possible explanation for partial observations.
We propose a new task and dataset, Visual Abductive Reasoning ( VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.
arXiv Detail & Related papers (2022-03-26T10:17:03Z) - Explaining Deep Reinforcement Learning Agents In The Atari Domain
through a Surrogate Model [78.69367679848632]
We describe a lightweight and effective method to derive explanations for deep RL agents.
Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation.
We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent.
arXiv Detail & Related papers (2021-10-07T05:01:44Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - What Did You Think Would Happen? Explaining Agent Behaviour Through
Intended Outcomes [30.056732656973637]
We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome.
These explanations describe the outcome an agent is trying to achieve by its actions.
We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning.
arXiv Detail & Related papers (2020-11-10T12:05:08Z) - On Generating Plausible Counterfactual and Semi-Factual Explanations for
Deep Learning [15.965337956587373]
PlausIble Exceptionality-based Contrastive Explanations (PIECE), modifies all exceptional features in a test image to be normal from the perspective of the counterfactual class.
Two controlled experiments compare PIECE to others in the literature, showing that PIECE not only generates the most plausible counterfactuals on several measures, but also the best semifactuals.
arXiv Detail & Related papers (2020-09-10T14:48:12Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.