Exploiting Language Instructions for Interpretable and Compositional
Reinforcement Learning
- URL: http://arxiv.org/abs/2001.04418v1
- Date: Mon, 13 Jan 2020 17:35:56 GMT
- Title: Exploiting Language Instructions for Interpretable and Compositional
Reinforcement Learning
- Authors: Michiel van der Meer, Matteo Pirotta, Elia Bruni
- Abstract summary: We attempt to interpret the latent space from an RL agent to identify its current objective in a complex language instruction.
Results show that the classification process causes changes in the hidden states which makes them more easily interpretable.
We limit the supervisory signal on the classification, and observe a similar but less notable effect.
- Score: 23.41381408504966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present an alternative approach to making an agent
compositional through the use of a diagnostic classifier. Because of the need
for explainable agents in automated decision processes, we attempt to interpret
the latent space from an RL agent to identify its current objective in a
complex language instruction. Results show that the classification process
causes changes in the hidden states which makes them more easily interpretable,
but also causes a shift in zero-shot performance to novel instructions. Lastly,
we limit the supervisory signal on the classification, and observe a similar
but less notable effect.
Related papers
- CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models [59.8529196670565]
CRAT is a novel multi-agent translation framework that leverages RAG and causality-enhanced self-reflection to address translation challenges.
Our results show that CRAT significantly improves translation accuracy, particularly in handling context-sensitive terms and emerging vocabulary.
arXiv Detail & Related papers (2024-10-28T14:29:11Z) - Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition [55.97779732051921]
State-of-the-art classifiers for facial expression recognition (FER) lack interpretability, an important feature for end-users.
A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models.
Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time.
arXiv Detail & Related papers (2024-10-01T10:42:55Z) - Transformer-based Causal Language Models Perform Clustering [20.430255724239448]
We introduce a simplified instruction-following task and use synthetic datasets to analyze a Transformer-based causal language model.
Our findings suggest that the model learns task-specific information by clustering data within its hidden space, with this clustering process evolving dynamically during learning.
arXiv Detail & Related papers (2024-02-19T14:02:31Z) - Semi-supervised counterfactual explanations [3.6810543937967912]
We address the challenge of generating counterfactual explanations that lie in the same data distribution as that of the training data.
This requirement has been addressed through the incorporation of auto-encoder reconstruction loss in the counterfactual search process.
We show further improvement in the interpretability of counterfactual explanations when the auto-encoder is trained in a semi-supervised fashion with class tagged input data.
arXiv Detail & Related papers (2023-03-22T15:17:16Z) - Explainable Reinforcement Learning via Model Transforms [18.385505289067023]
We argue that even if the underlying Markov Decision Process is not fully known, it can nevertheless be exploited to automatically generate explanations.
We suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations.
arXiv Detail & Related papers (2022-09-24T13:18:06Z) - Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions [51.71245032890532]
We propose methods enabling an agent acting upon the world to learn internal representations of sensory information consistent with actions that modify it.
In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform.
arXiv Detail & Related papers (2022-07-25T11:22:48Z) - Compositional Generalization in Grounded Language Learning via Induced
Model Sparsity [81.38804205212425]
We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations.
We design an agent that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal.
Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations.
arXiv Detail & Related papers (2022-07-06T08:46:27Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - A Novel Approach to Curiosity and Explainable Reinforcement Learning via
Interpretable Sub-Goals [0.0]
Two key challenges within Reinforcement Learning involve improving (a) agent learning within environments with sparse extrinsic rewards and (b) the explainability of agent actions.
We describe a curious subgoal focused agent to address both these challenges.
We use a novel method for curiosity produced from a Generative Adrial Network (GAN) based model of environment transitions that is robust to environment transitions.
arXiv Detail & Related papers (2021-04-14T05:21:13Z) - Counterfactual Detection meets Transfer Learning [48.82717416666232]
We show that detecting Counterfactuals is a straightforward Binary Classification Task that can be implemented with minimal adaptation on already existing model Architectures.
We introduce a new end to end pipeline to process antecedents and consequents as an entity recognition task, thus adapting them into Token Classification.
arXiv Detail & Related papers (2020-05-27T02:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.