Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation
- URL: http://arxiv.org/abs/2305.16985v2
- Date: Wed, 25 Oct 2023 17:59:44 GMT
- Title: Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation
- Authors: David Brandfonbrener, Ofir Nachum, Joan Bruna
- Abstract summary: We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
- Score: 66.86987509942607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, domains such as natural language processing and image
recognition have popularized the paradigm of using large datasets to pretrain
representations that can be effectively transferred to downstream tasks. In
this work we evaluate how such a paradigm should be done in imitation learning,
where both pretraining and finetuning data are trajectories collected by
experts interacting with an unknown environment. Namely, we consider a setting
where the pretraining corpus consists of multitask demonstrations and the task
for each demonstration is set by an unobserved latent context variable. The
goal is to use the pretraining corpus to learn a low dimensional representation
of the high dimensional (e.g., visual) observation space which can be
transferred to a novel context for finetuning on a limited dataset of
demonstrations. Among a variety of possible pretraining objectives, we argue
that inverse dynamics modeling -- i.e., predicting an action given the
observations appearing before and after it in the demonstration -- is
well-suited to this setting. We provide empirical evidence of this claim
through evaluations on a variety of simulated visuomotor manipulation problems.
While previous work has attempted various theoretical explanations regarding
the benefit of inverse dynamics modeling, we find that these arguments are
insufficient to explain the empirical advantages often observed in our
settings, and so we derive a novel analysis using a simple but general
environment model.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Learning invariant representations of time-homogeneous stochastic dynamical systems [27.127773672738535]
We study the problem of learning a representation of the state that faithfully captures its dynamics.
This is instrumental to learning the transfer operator or the generator of the system.
We show that the search for a good representation can be cast as an optimization problem over neural networks.
arXiv Detail & Related papers (2023-07-19T11:32:24Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - On the Viability of Monocular Depth Pre-training for Semantic Segmentation [48.29060171161375]
We study whether pre-training on geometric tasks is viable for downstream transfer to semantic tasks.
We find that monocular depth is a viable form of pre-training for semantic segmentation, validated by improvements over common baselines.
arXiv Detail & Related papers (2022-03-26T04:27:28Z) - On Contrastive Representations of Stochastic Processes [53.21653429290478]
Learning representations of processes is an emerging problem in machine learning.
We show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes.
arXiv Detail & Related papers (2021-06-18T11:00:24Z) - Video Prediction via Example Guidance [156.08546987158616]
In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.
In this work, we propose a simple yet effective framework that can efficiently predict plausible future states.
arXiv Detail & Related papers (2020-07-03T14:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.