Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
- URL: http://arxiv.org/abs/2501.17077v1
- Date: Tue, 28 Jan 2025 17:02:16 GMT
- Title: Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
- Authors: Anna Soligo, Pietro Ferraro, David Boyle,
- Abstract summary: Interpretability in reinforcement learning is crucial for ensuring AI systems align with human values.
We show how penalisation of non-local weights leads to the emergence of functionally independent modules in the policy network of a reinforcement learning agent.
- Score: 1.597617022056624
- License:
- Abstract: Interpretability in reinforcement learning is crucial for ensuring AI systems align with human values and fulfill the diverse related requirements including safety, robustness and fairness. Building on recent approaches to encouraging sparsity and locality in neural networks, we demonstrate how the penalisation of non-local weights leads to the emergence of functionally independent modules in the policy network of a reinforcement learning agent. To illustrate this, we demonstrate the emergence of two parallel modules for assessment of movement along the X and Y axes in a stochastic Minigrid environment. Through the novel application of community detection algorithms, we show how these modules can be automatically identified and their functional roles verified through direct intervention on the network weights prior to inference. This establishes a scalable framework for reinforcement learning interpretability through functional modularity, addressing challenges regarding the trade-off between completeness and cognitive tractability of reinforcement learning explanations.
Related papers
- Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations.
This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z) - Training Neural Networks for Modularity aids Interpretability [0.6749750044497732]
An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently.
We find pretrained models to be highly unclusterable and thus train models to be more modular using an enmeshment loss'' function that encourages the formation of non-interacting clusters.
arXiv Detail & Related papers (2024-09-24T05:03:49Z) - Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions [2.50194939587674]
We propose a new Model-based RL framework to enable efficient policy learning with unknown dynamics.
We introduce and explore a novel method for adding safety constraints for model-based RL during training and policy learning.
arXiv Detail & Related papers (2024-05-25T11:21:12Z) - Self-Supervised Interpretable End-to-End Learning via Latent Functional Modularity [2.163881720692685]
MoNet is a novel functionally modular network for self-supervised and interpretable end-to-end learning.
In real-world indoor environments, MoNet demonstrates effective visual autonomous navigation, outperforming baseline models by 7% to 28%.
arXiv Detail & Related papers (2024-02-21T15:17:20Z) - Foundations of Reinforcement Learning and Interactive Decision Making [81.76863968810423]
We present a unifying framework for addressing the exploration-exploitation dilemma using frequentist and Bayesian approaches.
Special attention is paid to function approximation and flexible model classes such as neural networks.
arXiv Detail & Related papers (2023-12-27T21:58:45Z) - Harmonizing Feature Attributions Across Deep Learning Architectures:
Enhancing Interpretability and Consistency [2.2237337682863125]
This study examines the generalization of feature attributions across various deep learning architectures.
We aim to develop a more coherent and optimistic understanding of feature attributions.
Our findings highlight the potential for harmonized feature attribution methods to improve interpretability and foster trust in machine learning applications.
arXiv Detail & Related papers (2023-07-05T09:46:41Z) - Interpreting Neural Policies with Disentangled Tree Representations [58.769048492254555]
We study interpretability of compact neural policies through the lens of disentangled representation.
We leverage decision trees to obtain factors of variation for disentanglement in robot learning.
We introduce interpretability metrics that measure disentanglement of learned neural dynamics.
arXiv Detail & Related papers (2022-10-13T01:10:41Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Meta-learning using privileged information for dynamics [66.32254395574994]
We extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting.
We validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks.
arXiv Detail & Related papers (2021-04-29T12:18:02Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.