RACCER: Towards Reachable and Certain Counterfactual Explanations for
Reinforcement Learning
- URL: http://arxiv.org/abs/2303.04475v2
- Date: Tue, 10 Oct 2023 10:06:05 GMT
- Title: RACCER: Towards Reachable and Certain Counterfactual Explanations for
Reinforcement Learning
- Authors: Jasmina Gajcin and Ivana Dusparic
- Abstract summary: We propose RACCER, the first-specific approach to generating counterfactual explanations for the behavior of RL agents.
We use a tree search to find the most suitable counterfactuals based on the defined properties.
We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior.
- Score: 2.0341936392563063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While reinforcement learning (RL) algorithms have been successfully applied
to numerous tasks, their reliance on neural networks makes their behavior
difficult to understand and trust. Counterfactual explanations are
human-friendly explanations that offer users actionable advice on how to alter
the model inputs to achieve the desired output from a black-box system.
However, current approaches to generating counterfactuals in RL ignore the
stochastic and sequential nature of RL tasks and can produce counterfactuals
that are difficult to obtain or do not deliver the desired outcome. In this
work, we propose RACCER, the first RL-specific approach to generating
counterfactual explanations for the behavior of RL agents. We first propose and
implement a set of RL-specific counterfactual properties that ensure easily
reachable counterfactuals with highly probable desired outcomes. We use a
heuristic tree search of the agent's execution trajectories to find the most
suitable counterfactuals based on the defined properties. We evaluate RACCER in
two tasks as well as conduct a user study to show that RL-specific
counterfactuals help users better understand agents' behavior compared to the
current state-of-the-art approaches.
Related papers
- Semifactual Explanations for Reinforcement Learning [1.5320737596132754]
Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error.
Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret.
Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks.
arXiv Detail & Related papers (2024-09-09T08:37:47Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - A Survey on Explainable Reinforcement Learning: Concepts, Algorithms,
Challenges [38.70863329476517]
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal.
Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential.
To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability.
arXiv Detail & Related papers (2022-11-12T13:52:06Z) - Redefining Counterfactual Explanations for Reinforcement Learning:
Overview, Challenges and Opportunities [2.0341936392563063]
Most explanation methods for AI are focused on developers and expert users.
Counterfactual explanations offer users advice on what can be changed in the input for the output of the black-box model to change.
Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system.
arXiv Detail & Related papers (2022-10-21T09:50:53Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z) - Making Sense of Reinforcement Learning and Probabilistic Inference [15.987913388420667]
Reinforcement learning (RL) combines a control problem with statistical estimation.
We show that the popular RL as inference' approximation can perform poorly in even very basic problems.
We show that with a small modification the framework does yield algorithms that can provably perform well.
arXiv Detail & Related papers (2020-01-03T12:50:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.