TripleTree: A Versatile Interpretable Representation of Black Box Agents
and their Environments
- URL: http://arxiv.org/abs/2009.04743v2
- Date: Mon, 21 Sep 2020 16:06:19 GMT
- Title: TripleTree: A Versatile Interpretable Representation of Black Box Agents
and their Environments
- Authors: Tom Bewley, Jonathan Lawry
- Abstract summary: We propose a versatile first step towards general understanding is to discretise the state space into convex regions.
We create such a representation using a novel variant of the CART decision tree algorithm.
We demonstrate how it facilitates practical understanding of black box agents through prediction, visualisation and rule-based explanation.
- Score: 9.822870889029113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In explainable artificial intelligence, there is increasing interest in
understanding the behaviour of autonomous agents to build trust and validate
performance. Modern agent architectures, such as those trained by deep
reinforcement learning, are currently so lacking in interpretable structure as
to effectively be black boxes, but insights may still be gained from an
external, behaviourist perspective. Inspired by conceptual spaces theory, we
suggest that a versatile first step towards general understanding is to
discretise the state space into convex regions, jointly capturing similarities
over the agent's action, value function and temporal dynamics within a dataset
of observations. We create such a representation using a novel variant of the
CART decision tree algorithm, and demonstrate how it facilitates practical
understanding of black box agents through prediction, visualisation and
rule-based explanation.
Related papers
- Disentangling Representations through Multi-task Learning [0.0]
We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve classification tasks.
We experimentally validate these predictions in RNNs trained on multi-task classification.
We find that transformers are particularly suited for disentangling representations, which might explain their unique world understanding abilities.
arXiv Detail & Related papers (2024-07-15T21:32:58Z) - Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture.
With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner.
For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z) - Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates.
We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training.
Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z) - (Un)reasonable Allure of Ante-hoc Interpretability for High-stakes
Domains: Transparency Is Necessary but Insufficient for Comprehensibility [25.542848590851758]
Ante-hoc interpretability has become the holy grail of explainable artificial intelligence for high-stakes domains such as healthcare.
It can refer to predictive models whose structure adheres to domain-specific constraints, or ones that are inherently transparent.
We unpack this concept to better understand what is needed for its safe adoption across high-stakes domains.
arXiv Detail & Related papers (2023-06-04T09:34:41Z) - Unsupervised Interpretable Basis Extraction for Concept-Based Visual
Explanations [53.973055975918655]
We show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method.
We compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
arXiv Detail & Related papers (2023-03-19T00:37:19Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - Interpretable Representations in Explainable AI: From Theory to Practice [7.031336702345381]
Interpretable representations are the backbone of many explainers that target black-box predictive systems.
We study properties of interpretable representations that encode presence and absence of human-comprehensible concepts.
arXiv Detail & Related papers (2020-08-16T21:44:03Z) - Probing Emergent Semantics in Predictive Agents via Question Answering [29.123837711842995]
Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments.
We propose question-answering as a general paradigm to decode and understand the representations that such agents develop the model.
We probe their internal state representations with synthetic (English) questions, without backpropagating gradients from the question-answering decoder into the agent.
arXiv Detail & Related papers (2020-06-01T15:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.