Explaining Deep Reinforcement Learning Agents In The Atari Domain
through a Surrogate Model
- URL: http://arxiv.org/abs/2110.03184v1
- Date: Thu, 7 Oct 2021 05:01:44 GMT
- Title: Explaining Deep Reinforcement Learning Agents In The Atari Domain
through a Surrogate Model
- Authors: Alexander Sieusahai and Matthew Guzdial
- Abstract summary: We describe a lightweight and effective method to derive explanations for deep RL agents.
Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation.
We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent.
- Score: 78.69367679848632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One major barrier to applications of deep Reinforcement Learning (RL) both
inside and outside of games is the lack of explainability. In this paper, we
describe a lightweight and effective method to derive explanations for deep RL
agents, which we evaluate in the Atari domain. Our method relies on a
transformation of the pixel-based input of the RL agent to an interpretable,
percept-like input representation. We then train a surrogate model, which is
itself interpretable, to replicate the behavior of the target, deep RL agent.
Our experiments demonstrate that we can learn an effective surrogate that
accurately approximates the underlying decision making of a target agent on a
suite of Atari games.
Related papers
- Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Agent-Controller Representations: Principled Offline RL with Rich
Exogenous Information [49.06422815335159]
Learning to control an agent from data collected offline is vital for real-world applications of reinforcement learning (RL)
This paper introduces offline RL benchmarks offering the ability to study this problem.
We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process.
arXiv Detail & Related papers (2022-10-31T22:12:48Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Counterfactual State Explanations for Reinforcement Learning Agents via
Generative Deep Learning [27.67522513615264]
We focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari.
We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning.
Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent.
arXiv Detail & Related papers (2021-01-29T07:43:41Z) - Shielding Atari Games with Bounded Prescience [8.874011540975715]
We present the first exact method for analysing and ensuring the safety of DRL agents for Atari games.
First, we give a set of 43 properties that characterise "safe behaviour" for 30 games.
Second, we develop a method for exploring all traces induced by an agent and a game.
Third, we propose a countermeasure that combines a bounded explicit-state exploration with shielding.
arXiv Detail & Related papers (2021-01-20T14:22:04Z) - Self-Supervised Discovering of Interpretable Features for Reinforcement
Learning [40.52278913726904]
We propose a self-supervised interpretable framework for deep reinforcement learning.
A self-supervised interpretable network (SSINet) is employed to produce fine-grained attention masks for highlighting task-relevant information.
We verify and evaluate our method on several Atari 2600 games as well as Duckietown, which is a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2020-03-16T08:26:17Z) - Learn to Interpret Atari Agents [106.21468537372995]
Region-sensitive Rainbow (RS-Rainbow) is an end-to-end trainable network based on the original Rainbow, a powerful deep Q-network agent.
Our proposed agent, named region-sensitive Rainbow (RS-Rainbow), is an end-to-end trainable network based on the original Rainbow, a powerful deep Q-network agent.
arXiv Detail & Related papers (2018-12-29T03:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.