Resolving Copycat Problems in Visual Imitation Learning via Residual
Action Prediction
- URL: http://arxiv.org/abs/2207.09705v1
- Date: Wed, 20 Jul 2022 07:15:32 GMT
- Title: Resolving Copycat Problems in Visual Imitation Learning via Residual
Action Prediction
- Authors: Chia-Chi Chuang, Donglin Yang, Chuan Wen, Yang Gao
- Abstract summary: We show that imitation from observation histories performs worse than imitation from the most recent observation.
We propose a novel imitation learning neural network architecture that does not suffer from this issue by design.
- Score: 10.275717930942989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning is a widely used policy learning method that enables
intelligent agents to acquire complex skills from expert demonstrations. The
input to the imitation learning algorithm is usually composed of both the
current observation and historical observations since the most recent
observation might not contain enough information. This is especially the case
with image observations, where a single image only includes one view of the
scene, and it suffers from a lack of motion information and object occlusions.
In theory, providing multiple observations to the imitation learning agent will
lead to better performance. However, surprisingly people find that sometimes
imitation from observation histories performs worse than imitation from the
most recent observation. In this paper, we explain this phenomenon from the
information flow within the neural network perspective. We also propose a novel
imitation learning neural network architecture that does not suffer from this
issue by design. Furthermore, our method scales to high-dimensional image
observations. Finally, we benchmark our approach on two widely used simulators,
CARLA and MuJoCo, and it successfully alleviates the copycat problem and
surpasses the existing solutions.
Related papers
- Towards Principled Representation Learning from Videos for Reinforcement Learning [23.877731515619868]
We study pre-training representations for decision-making using video data.
We focus on learning the latent state representations of the underlying MDP using video data.
arXiv Detail & Related papers (2024-03-20T17:28:17Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - The Surprising Effectiveness of Representation Learning for Visual
Imitation [12.60653315718265]
We propose to decouple representation learning from behavior learning for visual imitation.
First, we learn a visual representation encoder from offline data using standard supervised and self-supervised learning methods.
We experimentally show that this simple decoupling improves the performance of visual imitation models on both offline demonstration datasets and real-robot door opening compared to prior work in visual imitation.
arXiv Detail & Related papers (2021-12-02T18:58:09Z) - Understanding invariance via feedforward inversion of discriminatively
trained classifiers [30.23199531528357]
Past research has discovered that some extraneous visual detail remains in the output logits.
We develop a feedforward inversion model that produces remarkably high fidelity reconstructions.
Our approach is based on BigGAN, with conditioning on logits instead of one-hot class labels.
arXiv Detail & Related papers (2021-03-15T17:56:06Z) - Imitation Learning with Human Eye Gaze via Multi-Objective Prediction [3.5779268406205618]
We propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware imitation learning architecture.
GRIL learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context.
We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data.
arXiv Detail & Related papers (2021-02-25T17:13:13Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z) - Towards Learning to Imitate from a Single Video Demonstration [11.15358253586118]
We develop a reinforcement learning agent that can learn to imitate given video observation.
We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips.
We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
arXiv Detail & Related papers (2019-01-22T06:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.