Related papers: Explainable Reinforcement Learning via Model Transforms

Explainable Reinforcement Learning via Model Transforms

URL: http://arxiv.org/abs/2209.12006v1
Date: Sat, 24 Sep 2022 13:18:06 GMT
Title: Explainable Reinforcement Learning via Model Transforms
Authors: Mira Finkelstein, Lucy Liu, Nitsan Levy Schlot, Yoav Kolumbus, David C. Parkes, Jeffrey S. Rosenshein and Sarah Keren
Abstract summary: We argue that even if the underlying Markov Decision Process is not fully known, it can nevertheless be exploited to automatically generate explanations. We suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations.
Score: 18.385505289067023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of approaches to explainability in RL that aim to reconcile discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. Most recent approaches have relied either on domain knowledge, that may not always be available, on an analysis of the agent's policy, or on an analysis of specific elements of the underlying environment, typically modeled as a Markov Decision Process (MDP). Our key claim is that even if the underlying MDP is not fully known (e.g., the transition probabilities have not been accurately learned) or is not maintained by the agent (i.e., when using model-free methods), it can nevertheless be exploited to automatically generate explanations. For this purpose, we suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations. Since such transforms are typically based on a symbolic representation of the environment, they may represent meaningful explanations for gaps between the anticipated and actual agent behavior. We formally define this problem, suggest a class of transforms that can be used for explaining emergent behaviors, and suggest methods that enable efficient search for an explanation. We demonstrate the approach on a set of standard benchmarks.

Related papers

Generalization in Monitored Markov Decision Processes (Mon-MDPs) [9.81003561034599]
In many real-world scenarios, rewards are not always observable, which can be modeled as a monitored Markov decision process (Mon-MDP)<n>This work explores Mon-MDPs using function approximation (FA) and investigates the challenges involved.<n>We show that combining function approximation with a learned reward model enables agents to generalize from monitored states with observable rewards, to unmonitored environment states with unobservable rewards.
arXiv Detail & Related papers (2025-05-13T21:58:25Z)
Behaviour Discovery and Attribution for Explainable Reinforcement Learning [6.123880364445758]
We propose a framework for behavior discovery and action attribution to behaviors in offline RL trajectories. Our method identifies meaningful behavioral segments, enabling more precise and granular explanations. This approach is adaptable across diverse environments with minimal modifications.
arXiv Detail & Related papers (2025-03-19T08:06:00Z)
Demystifying Reinforcement Learning in Production Scheduling via Explainable AI [0.7515066610159392]
Deep Reinforcement Learning (DRL) is a frequently employed technique to solve scheduling problems. Although DRL agents ace at delivering viable results in short computing times, their reasoning remains opaque. We apply two explainable AI (xAI) frameworks to describe the reasoning behind scheduling decisions of a specialized DRL agent in a flow production.
arXiv Detail & Related papers (2024-08-19T09:39:01Z)
Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation [7.647395374489533]
We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions. We show that our approach generates explanations as helpful as those produced by a human domain expert.
arXiv Detail & Related papers (2023-11-29T20:16:23Z)
Inverse Decision Modeling: Learning Interpretable Representations of Behavior [72.80902932543474]
We develop an expressive, unifying perspective on inverse decision modeling. We use this to formalize the inverse problem (as a descriptive model) We illustrate how this structure enables learning (interpretable) representations of (bounded) rationality.
arXiv Detail & Related papers (2023-10-28T05:05:01Z)
Explainable Multi-Agent Reinforcement Learning for Temporal Queries [18.33682005623418]
This work presents an approach for generating policy-level contrastive explanations for MARL to answer a temporal user query. The proposed approach encodes the temporal query as a PCTL logic formula and checks if the query is feasible under a given MARL policy. The results of a user study show that the generated explanations significantly improve user performance and satisfaction.
arXiv Detail & Related papers (2023-05-17T17:04:29Z)
GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations [0.7874708385247353]
We propose a novel but simple method to generate counterfactual explanations for RL agents. Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics.
arXiv Detail & Related papers (2023-02-24T15:29:43Z)
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration [17.27164535440641]
Posterior sampling is a promising approach, but it requires Bayesian inference and dynamic programming. We show that even though partial models exclude relevant information from the environment, they can nevertheless lead to good policies.
arXiv Detail & Related papers (2023-02-08T18:35:24Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions. By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z)
Explaining Reinforcement Learning Policies through Counterfactual Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time. Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)
Plannable Approximations to MDP Homomorphisms: Equivariance under Actions [72.30921397899684]
We introduce a contrastive loss function that enforces action equivariance on the learned representations. We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process. We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP.
arXiv Detail & Related papers (2020-02-27T08:29:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.