Related papers: Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning

Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning

URL: http://arxiv.org/abs/2101.12446v1
Date: Fri, 29 Jan 2021 07:43:41 GMT
Title: Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning
Authors: Matthew L. Olson, Roli Khanna, Lawrence Neal, Fuxin Li, Weng-Keen Wong
Abstract summary: We focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent.
Score: 27.67522513615264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual explanations, which deal with "why not?" scenarios, can provide insightful explanations to an AI agent's behavior. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state illustrates what minimal change is needed to an Atari game image such that the agent chooses a different action. We also evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our first user study investigates if humans can discern if the counterfactual state explanations are produced by the actual game or produced by a generative deep learning approach. Our second user study investigates if counterfactual state explanations can help non-expert participants identify a flawed agent; we compare against a baseline approach based on a nearest neighbor explanation which uses images from the actual game. Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent compared to the nearest neighbor baseline and to having no explanation at all.

Related papers

GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations [0.7874708385247353]
We propose a novel but simple method to generate counterfactual explanations for RL agents. Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics.
arXiv Detail & Related papers (2023-02-24T15:29:43Z)
Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities [2.0341936392563063]
Most explanation methods for AI are focused on developers and expert users. Counterfactual explanations offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system.
arXiv Detail & Related papers (2022-10-21T09:50:53Z)
Experiential Explanations for Reinforcement Learning [15.80179578318569]
Reinforcement Learning systems can be complex and non-interpretable. We propose a technique, Experiential Explanations, to generate counterfactual explanations.
arXiv Detail & Related papers (2022-10-10T14:27:53Z)
Learning to Scaffold: Optimizing Model Explanations for Teaching [74.25464914078826]
We train models on three natural language processing and computer vision tasks. We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
arXiv Detail & Related papers (2022-04-22T16:43:39Z)
Visual Abductive Reasoning [85.17040703205608]
Abductive reasoning seeks the likeliest possible explanation for partial observations. We propose a new task and dataset, Visual Abductive Reasoning ( VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.
arXiv Detail & Related papers (2022-03-26T10:17:03Z)
Explaining Deep Reinforcement Learning Agents In The Atari Domain through a Surrogate Model [78.69367679848632]
We describe a lightweight and effective method to derive explanations for deep RL agents. Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation. We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent.
arXiv Detail & Related papers (2021-10-07T05:01:44Z)
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA) We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z)
This is not the Texture you are looking for! Introducing Novel Counterfactual Explanations for Non-Experts using Generative Adversarial Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image. We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques. Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z)
What Did You Think Would Happen? Explaining Agent Behaviour Through Intended Outcomes [30.056732656973637]
We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome. These explanations describe the outcome an agent is trying to achieve by its actions. We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning.
arXiv Detail & Related papers (2020-11-10T12:05:08Z)
On Generating Plausible Counterfactual and Semi-Factual Explanations for Deep Learning [15.965337956587373]
PlausIble Exceptionality-based Contrastive Explanations (PIECE), modifies all exceptional features in a test image to be normal from the perspective of the counterfactual class. Two controlled experiments compare PIECE to others in the literature, showing that PIECE not only generates the most plausible counterfactuals on several measures, but also the best semifactuals.
arXiv Detail & Related papers (2020-09-10T14:48:12Z)
Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL) In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.