DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
- URL: http://arxiv.org/abs/2409.12192v2
- Date: Wed, 30 Oct 2024 18:48:00 GMT
- Title: DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
- Authors: Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, Lerrel Pinto,
- Abstract summary: Imitation learning has proven to be a powerful tool for training complex visuomotor policies.
Current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations.
We present DynaMo, a new in-domain, self-supervised method for learning visual representations.
- Score: 18.737628473949048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason for this poor data efficiency is that visual representations are predominantly either pretrained on out-of-domain data or trained directly through a behavior cloning objective. In this work, we present DynaMo, a new in-domain, self-supervised method for learning visual representations. Given a set of expert demonstrations, we jointly learn a latent inverse dynamics model and a forward dynamics model over a sequence of image embeddings, predicting the next frame in latent space, without augmentations, contrastive sampling, or access to ground truth actions. Importantly, DynaMo does not require any out-of-domain data such as Internet datasets or cross-embodied datasets. On a suite of six simulated and real environments, we show that representations learned with DynaMo significantly improve downstream imitation learning performance over prior self-supervised learning objectives, and pretrained representations. Gains from using DynaMo hold across policy classes such as Behavior Transformer, Diffusion Policy, MLP, and nearest neighbors. Finally, we ablate over key components of DynaMo and measure its impact on downstream policy performance. Robot videos are best viewed at https://dynamo-ssl.github.io
Related papers
- Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - RePo: Resilient Model-Based Reinforcement Learning by Regularizing
Posterior Predictability [25.943330238941602]
We propose a visual model-based RL method that learns a latent representation resilient to spurious variations.
Our training objective encourages the representation to be maximally predictive of dynamics and reward.
Our effort is a step towards making model-based RL a practical and useful tool for dynamic, diverse domains.
arXiv Detail & Related papers (2023-08-31T18:43:04Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Palm up: Playing in the Latent Manifold for Unsupervised Pretraining [31.92145741769497]
We propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.
Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space.
We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data.
arXiv Detail & Related papers (2022-10-19T22:26:12Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects.
We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.