Behaviour Discovery and Attribution for Explainable Reinforcement Learning
- URL: http://arxiv.org/abs/2503.14973v2
- Date: Mon, 16 Jun 2025 21:43:25 GMT
- Title: Behaviour Discovery and Attribution for Explainable Reinforcement Learning
- Authors: Rishav Rishav, Somjit Nath, Vincent Michalski, Samira Ebrahimi Kahou,
- Abstract summary: Building trust in reinforcement learning (RL) agents requires understanding why they make certain decisions.<n>Existing explainability methods often focus on single states or entire trajectories.<n>We propose a fully offline, reward-free framework for behavior discovery and segmentation.
- Score: 6.123880364445758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building trust in reinforcement learning (RL) agents requires understanding why they make certain decisions, especially in high-stakes applications like robotics, healthcare, and finance. Existing explainability methods often focus on single states or entire trajectories, either providing only local, step-wise insights or attributing decisions to coarse, episodelevel summaries. Both approaches miss the recurring strategies and temporally extended patterns that actually drive agent behavior across multiple decisions. We address this gap by proposing a fully offline, reward-free framework for behavior discovery and segmentation, enabling the attribution of actions to meaningful and interpretable behavior segments that capture recurring patterns appearing across multiple trajectories. Our method identifies coherent behavior clusters from state-action sequences and attributes individual actions to these clusters for fine-grained, behavior-centric explanations. Evaluations on four diverse offline RL environments show that our approach discovers meaningful behaviors and outperforms trajectory-level baselines in fidelity, human preference, and cluster coherence. Our code is publicly available.
Related papers
- Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)
We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.
We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression [19.01804572722833]
In real-world sequential decision making tasks, learning from observed state-action trajectories is critical for tasks like imitation, classification, and clustering.<n>We propose a novel method for embedding state-action trajectories into a latent space that captures the skills and competencies in the dynamic underlying decision-making processes.
arXiv Detail & Related papers (2025-01-16T06:52:58Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Causal State Distillation for Explainable Reinforcement Learning [16.998047658978482]
Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be challenging.
Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD)
RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner.
We present an extension of RD that goes beyond sub-rewards to provide more informative explanations.
arXiv Detail & Related papers (2023-12-30T00:01:22Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable.
We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Leveraging Factored Action Spaces for Efficient Offline Reinforcement
Learning in Healthcare [38.42691031505782]
We propose a form of linear Q-function decomposition induced by factored action spaces.
Our approach can help an agent make more accurate inferences within underexplored regions of the state-action space.
arXiv Detail & Related papers (2023-05-02T19:13:10Z) - Spatiotemporal Self-supervised Learning for Point Clouds in the Wild [65.56679416475943]
We introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domain.
We demonstrate the benefits of our approach via extensive experiments performed by self-supervised training on two large-scale LiDAR datasets.
arXiv Detail & Related papers (2023-03-28T18:06:22Z) - RACCER: Towards Reachable and Certain Counterfactual Explanations for
Reinforcement Learning [2.0341936392563063]
We propose RACCER, the first-specific approach to generating counterfactual explanations for the behavior of RL agents.
We use a tree search to find the most suitable counterfactuals based on the defined properties.
We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior.
arXiv Detail & Related papers (2023-03-08T09:47:00Z) - Emergent Behaviors in Multi-Agent Target Acquisition [0.0]
We simulate a Multi-Agent System (MAS) using Reinforcement Learning (RL) in a pursuit-evasion game.
We create different adversarial scenarios by replacing RL-trained pursuers' policies with two distinct (non-RL) analytical strategies.
The novelty of our approach entails the creation of an influential feature set that reveals underlying data regularities.
arXiv Detail & Related papers (2022-12-15T15:20:58Z) - Disentangled Representation Learning [46.51815065323667]
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form.
We comprehensively investigate DRL from various aspects including motivations, definitions, methodologies, evaluations, applications, and model designs.
arXiv Detail & Related papers (2022-11-21T18:14:38Z) - Explainable Reinforcement Learning via Model Transforms [18.385505289067023]
We argue that even if the underlying Markov Decision Process is not fully known, it can nevertheless be exploited to automatically generate explanations.
We suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations.
arXiv Detail & Related papers (2022-09-24T13:18:06Z) - A Framework for Understanding and Visualizing Strategies of RL Agents [0.0]
We present a framework for learning comprehensible models of sequential decision tasks in which agent strategies are characterized using temporal logic formulas.
We evaluate our framework on combat scenarios in StarCraft II (SC2) using traces from a handcrafted expert policy and a trained reinforcement learning agent.
arXiv Detail & Related papers (2022-08-17T21:58:19Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Beyond Rewards: a Hierarchical Perspective on Offline Multiagent
Behavioral Analysis [14.656957226255628]
We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains.
Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or models, and can be trained using entirely offline observational data.
arXiv Detail & Related papers (2022-06-17T23:07:33Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.