Related papers: SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

URL: http://arxiv.org/abs/2404.18326v1
Date: Sun, 28 Apr 2024 21:47:34 GMT
Title: SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
Authors: Amir Samadi, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati,
Abstract summary: A lack of explainability of learned policies impedes its uptake in safety-critical applications, such as automated driving systems. Counterfactual (CF) explanations have recently gained prominence for their ability to interpret black-box Deep Learning (DL) models. We propose using a saliency map to identify the most influential input pixels across the sequence of past observed states by the agent. We evaluate the effectiveness of our framework in diverse domains, including ADS, Atari Pong, Pacman and space-invaders games.
Score: 13.26174103650211
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Deep Reinforcement Learning (DRL) has emerged as a promising solution for intricate control tasks, the lack of explainability of the learned policies impedes its uptake in safety-critical applications, such as automated driving systems (ADS). Counterfactual (CF) explanations have recently gained prominence for their ability to interpret black-box Deep Learning (DL) models. CF examples are associated with minimal changes in the input, resulting in a complementary output by the DL model. Finding such alternations, particularly for high-dimensional visual inputs, poses significant challenges. Besides, the temporal dependency introduced by the reliance of the DRL agent action on a history of past state observations further complicates the generation of CF examples. To address these challenges, we propose using a saliency map to identify the most influential input pixels across the sequence of past observed states by the agent. Then, we feed this map to a deep generative model, enabling the generation of plausible CFs with constrained modifications centred on the salient regions. We evaluate the effectiveness of our framework in diverse domains, including ADS, Atari Pong, Pacman and space-invaders games, using traditional performance metrics such as validity, proximity and sparsity. Experimental results demonstrate that this framework generates more informative and plausible CFs than the state-of-the-art for a wide range of environments and DRL agents. In order to foster research in this area, we have made our datasets and codes publicly available at https://github.com/Amir-Samadi/SAFE-RL.

Related papers

RAD: Retrieval High-quality Demonstrations to Enhance Decision-making [23.136426643341462]
offline reinforcement learning (RL) enables agents to learn policies from fixed datasets.<n>RL is often limited by dataset sparsity and the lack of transition overlap between suboptimal and expert trajectories.<n>We propose Retrieval High-quAlity Demonstrations (RAD) for decision-making, which combines non-parametric retrieval with diffusion-based generative modeling.
arXiv Detail & Related papers (2025-07-21T08:08:18Z)
Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z)
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning [9.025671446527694]
Reinforcement learning from human feedback (RLHF) has become a crucial step in building reliable generative AI models. This study is to develop a disciplined approach to fine-tune diffusion models using continuous-time RL.
arXiv Detail & Related papers (2025-02-03T20:50:05Z)
FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL [19.236153474365747]
Existing MARL approaches often rely on the restrictive assumption that the number of entities remains constant between training and inference. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization. We propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods.
arXiv Detail & Related papers (2024-10-21T10:57:45Z)
An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving [0.0]
Research investigates the challenges Deep Reinforcement Learning (DRL) faces in Partially Observable Markov Decision Processes (POMDP) Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations. We show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator.
arXiv Detail & Related papers (2024-09-02T14:16:23Z)
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
Decision Mamba is a novel multi-grained state space model with a self-evolving policy learning strategy. To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization. The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations.
arXiv Detail & Related papers (2024-06-08T10:12:00Z)
Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification. We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations. Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z)
Episodic Reinforcement Learning with Expanded State-reward Space [1.479675621064679]
We introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used as the input and the expanded rewards used in the training both contain historical and current information. Our method is able to simultaneously achieve the full utilization of retrieval information and the better evaluation of state values by a Temporal Difference (TD) loss.
arXiv Detail & Related papers (2024-01-19T06:14:36Z)
Disentangled Contrastive Collaborative Filtering [36.400303346450514]
Graph contrastive learning (GCL) has exhibited powerful performance in addressing the supervision label shortage issue. We propose a Disentangled Contrastive Collaborative Filtering framework (DCCF) to realize intent disentanglement with self-supervised augmentation. Our DCCF is able to not only distill finer-grained latent factors from the entangled self-supervision signals but also alleviate the augmentation-induced noise.
arXiv Detail & Related papers (2023-05-04T11:53:38Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information [49.06422815335159]
Learning to control an agent from data collected offline is vital for real-world applications of reinforcement learning (RL) This paper introduces offline RL benchmarks offering the ability to study this problem. We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process.
arXiv Detail & Related papers (2022-10-31T22:12:48Z)
Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions. performs the detection by distinguishing the AE's abnormal relation with its augmented versions. An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z)
ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of Arbitrary Predictive Models [6.939617874336667]
We introduce a model-agnostic algorithm to generate optimal counterfactual explanations. Our method is easily applied to any black-box model, as this resembles the environment that the DRL agent interacts with. In addition, we develop an algorithm to extract explainable decision rules from the DRL agent's policy, so as to make the process of generating CFs itself transparent.
arXiv Detail & Related papers (2021-10-22T17:08:49Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.