Generalizable Imitation Learning Through Pre-Trained Representations
- URL: http://arxiv.org/abs/2311.09350v1
- Date: Wed, 15 Nov 2023 20:15:51 GMT
- Title: Generalizable Imitation Learning Through Pre-Trained Representations
- Authors: Wei-Di Chang, Francois Hogan, David Meger, and Gregory Dudek
- Abstract summary: We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations.
Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types.
- Score: 19.98418419179064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we leverage self-supervised vision transformer models and their
emergent semantic abilities to improve the generalization abilities of
imitation learning policies. We introduce BC-ViT, an imitation learning
algorithm that leverages rich DINO pre-trained Visual Transformer (ViT)
patch-level embeddings to obtain better generalization when learning through
demonstrations. Our learner sees the world by clustering appearance features
into semantic concepts, forming stable keypoints that generalize across a wide
range of appearance variations and object types. We show that this
representation enables generalized behaviour by evaluating imitation learning
across a diverse dataset of object manipulation tasks. Our method, data and
evaluation approach are made available to facilitate further study of
generalization in Imitation Learners.
Related papers
- Learning and Leveraging World Models in Visual Representation Learning [34.81177885432796]
Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model.
We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict the effect of global photometric transformations in latent space.
arXiv Detail & Related papers (2024-03-01T13:05:38Z) - Invariance is Key to Generalization: Examining the Role of
Representation in Sim-to-Real Transfer for Visual Navigation [35.01394611106655]
Key to generalization is representations that are rich enough to capture all task-relevant information.
We experimentally study such a representation for visual navigation.
We show that our representation reduces the A-distance between the training and test domains.
arXiv Detail & Related papers (2023-10-23T15:15:19Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Representation Learning for Out-Of-Distribution Generalization in
Reinforcement Learning [39.21650402977466]
This paper aims to establish the first systematic characterization of the usefulness of learned representations for real-world downstream tasks.
By training over 10,000 reinforcement learning policies, we extensively evaluate to what extent different representation properties affect out-of-distribution generalization.
We demonstrate zero-shot transfer of these policies from simulation to the real world, without any domain randomization or fine-tuning.
arXiv Detail & Related papers (2021-07-12T18:49:48Z) - A Self-Supervised Framework for Function Learning and Extrapolation [1.9374999427973014]
We present a framework for how a learner may acquire representations that support generalization.
We show the resulting representations outperform those from other models for unsupervised time series learning.
arXiv Detail & Related papers (2021-06-14T12:41:03Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Guided Variational Autoencoder for Disentanglement Learning [79.02010588207416]
We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.
We design an unsupervised strategy and a supervised strategy in Guided-VAE and observe enhanced modeling and controlling capability over the vanilla VAE.
arXiv Detail & Related papers (2020-04-02T20:49:15Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.