Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning
- URL: http://arxiv.org/abs/2309.01458v1
- Date: Mon, 4 Sep 2023 09:09:54 GMT
- Title: Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning
- Authors: Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, Shiji
Song
- Abstract summary: It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
- Score: 69.19840497497503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The black-box nature of deep reinforcement learning (RL) hinders them from
real-world applications. Therefore, interpreting and explaining RL agents have
been active research topics in recent years. Existing methods for post-hoc
explanations usually adopt the action matching principle to enable an easy
understanding of vision-based RL agents. In this paper, it is argued that the
commonly used action matching principle is more like an explanation of deep
neural networks (DNNs) than the interpretation of RL agents. It may lead to
irrelevant or misplaced feature attribution when different DNNs' outputs lead
to the same rewards or different rewards result from the same outputs.
Therefore, we propose to consider rewards, the essential objective of RL
agents, as the essential objective of interpreting RL agents as well. To ensure
reward consistency during interpretable feature discovery, a novel framework
(RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient
disconnection from actions to rewards. We verify and evaluate our method on the
Atari 2600 games as well as Duckietown, a challenging self-driving car
simulator environment. The results show that our method manages to keep reward
(or return) consistency and achieves high-quality feature attribution. Further,
a series of analytical experiments validate our assumption of the action
matching principle's limitations.
Related papers
- Semifactual Explanations for Reinforcement Learning [1.5320737596132754]
Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error.
Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret.
Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks.
arXiv Detail & Related papers (2024-09-09T08:37:47Z) - Survival Instinct in Offline Reinforcement Learning [28.319886852612672]
offline RL can produce well-optimal and safe policies even when trained with "wrong" reward labels.
We demonstrate that this surprising property is attributable to an interplay between the notion of pessimism in offline RL algorithms and certain implicit biases in common data collection practices.
Our empirical and theoretical results suggest a new paradigm for RL, whereby an agent is nudged to learn a desirable behavior with imperfect reward but purposely biased data coverage.
arXiv Detail & Related papers (2023-06-05T22:15:39Z) - RACCER: Towards Reachable and Certain Counterfactual Explanations for
Reinforcement Learning [2.0341936392563063]
We propose RACCER, the first-specific approach to generating counterfactual explanations for the behavior of RL agents.
We use a tree search to find the most suitable counterfactuals based on the defined properties.
We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior.
arXiv Detail & Related papers (2023-03-08T09:47:00Z) - A Survey on Explainable Reinforcement Learning: Concepts, Algorithms,
Challenges [38.70863329476517]
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal.
Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential.
To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability.
arXiv Detail & Related papers (2022-11-12T13:52:06Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Explaining Deep Reinforcement Learning Agents In The Atari Domain
through a Surrogate Model [78.69367679848632]
We describe a lightweight and effective method to derive explanations for deep RL agents.
Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation.
We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent.
arXiv Detail & Related papers (2021-10-07T05:01:44Z) - Decoupling Exploration and Exploitation in Reinforcement Learning [8.946655323517092]
We propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation.
We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards.
arXiv Detail & Related papers (2021-07-19T15:31:02Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.