Visual Imitation Learning with Calibrated Contrastive Representation
- URL: http://arxiv.org/abs/2401.11396v1
- Date: Sun, 21 Jan 2024 04:18:30 GMT
- Title: Visual Imitation Learning with Calibrated Contrastive Representation
- Authors: Yunke Wang, Linwei Tao, Bo Du, Yutian Lin, Chang Xu
- Abstract summary: Adversarial Imitation Learning (AIL) allows the agent to reproduce expert behavior with low-dimensional states and actions.
This paper proposes a simple and effective solution by incorporating contrastive representative learning into visual AIL framework.
- Score: 44.63125396964309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial Imitation Learning (AIL) allows the agent to reproduce expert
behavior with low-dimensional states and actions. However, challenges arise in
handling visual states due to their less distinguishable representation
compared to low-dimensional proprioceptive features. While existing methods
resort to adopt complex network architectures or separate the process of
learning representation and decision-making, they overlook valuable intra-agent
information within demonstrations. To address this problem, this paper proposes
a simple and effective solution by incorporating calibrated contrastive
representative learning into visual AIL framework. Specifically, we present an
image encoder in visual AIL, utilizing a combination of unsupervised and
supervised contrastive learning to extract valuable features from visual
states. Based on the fact that the improved agent often produces demonstrations
of varying quality, we propose to calibrate the contrastive loss by treating
each agent demonstrations as a mixed sample. The incorporation of contrastive
learning can be jointly optimized with the AIL framework, without modifying the
architecture or incurring significant computational costs. Experimental results
on DMControl Suite demonstrate our proposed method is sample efficient and can
outperform other compared methods from different aspects.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL)
Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z) - Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL [16.792949555151978]
Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL)
Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality.
We propose Contrastive Reconstructive Aggregated representation Learning (CoRAL), a unified framework enabling us to choose the most appropriate self-supervised loss for each sensor modality.
arXiv Detail & Related papers (2023-02-10T15:57:20Z) - Visual Perturbation-aware Collaborative Learning for Overcoming the
Language Prior Problem [60.0878532426877]
We propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration.
Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents.
The experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness.
arXiv Detail & Related papers (2022-07-24T23:50:52Z) - An Empirical Investigation of Representation Learning for Imitation [76.48784376425911]
Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data.
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation.
arXiv Detail & Related papers (2022-05-16T11:23:42Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - On Mutual Information in Contrastive Learning for Visual Representations [19.136685699971864]
unsupervised, "contrastive" learning algorithms in vision have been shown to learn representations that perform remarkably well on transfer tasks.
We show that this family of algorithms maximizes a lower bound on the mutual information between two or more "views" of an image.
We find that the choice of negative samples and views are critical to the success of these algorithms.
arXiv Detail & Related papers (2020-05-27T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.