Efficiently Guiding Imitation Learning Agents with Human Gaze
- URL: http://arxiv.org/abs/2002.12500v4
- Date: Wed, 21 Apr 2021 21:39:21 GMT
- Title: Efficiently Guiding Imitation Learning Agents with Human Gaze
- Authors: Akanksha Saran, Ruohan Zhang, Elaine Schaertl Short and Scott Niekum
- Abstract summary: We use gaze cues from human demonstrators to enhance the performance of agents trained via three popular imitation learning methods.
Based on similarities between the attention of reinforcement learning agents and human gaze, we propose a novel approach for utilizing gaze data in a computationally efficient manner.
Our proposed approach improves the performance by 95% for BC, 343% for BCO, and 390% for T-REX, averaged over 20 different Atari games.
- Score: 28.7222865388462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human gaze is known to be an intention-revealing signal in human
demonstrations of tasks. In this work, we use gaze cues from human
demonstrators to enhance the performance of agents trained via three popular
imitation learning methods -- behavioral cloning (BC), behavioral cloning from
observation (BCO), and Trajectory-ranked Reward EXtrapolation (T-REX). Based on
similarities between the attention of reinforcement learning agents and human
gaze, we propose a novel approach for utilizing gaze data in a computationally
efficient manner, as part of an auxiliary loss function, which guides a network
to have higher activations in image regions where the human's gaze fixated.
This work is a step towards augmenting any existing convolutional imitation
learning agent's training with auxiliary gaze data. Our auxiliary
coverage-based gaze loss (CGL) guides learning toward a better reward function
or policy, without adding any additional learnable parameters and without
requiring gaze data at test time. We find that our proposed approach improves
the performance by 95% for BC, 343% for BCO, and 390% for T-REX, averaged over
20 different Atari games. We also find that compared to a prior
state-of-the-art imitation learning method assisted by human gaze (AGIL), our
method achieves better performance, and is more efficient in terms of learning
with fewer demonstrations. We further interpret trained CGL agents with a
saliency map visualization method to explain their performance. At last, we
show that CGL can help alleviate a well-known causal confusion problem in
imitation learning.
Related papers
- "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Improving Learning from Demonstrations by Learning from Experience [4.605233477425785]
We propose a new algorithm named TD3fG that can smoothly transition from learning from experts to learning from experience.
Our algorithm achieves good performance in the MUJOCO environment with limited and sub-optimal demonstrations.
arXiv Detail & Related papers (2021-11-16T00:40:31Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Imitation Learning with Human Eye Gaze via Multi-Objective Prediction [3.5779268406205618]
We propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware imitation learning architecture.
GRIL learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context.
We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data.
arXiv Detail & Related papers (2021-02-25T17:13:13Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z) - Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze [19.10872208787867]
Mutual gaze detection plays an important role in understanding human interactions.
We propose a simple and effective approach to boost the performance by using an auxiliary 3D gaze estimation task during the training phase.
We achieve the performance boost without additional labeling cost by training the 3D gaze estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
arXiv Detail & Related papers (2020-10-15T15:01:41Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z) - Towards Learning to Imitate from a Single Video Demonstration [11.15358253586118]
We develop a reinforcement learning agent that can learn to imitate given video observation.
We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips.
We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
arXiv Detail & Related papers (2019-01-22T06:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.