Imitating Task and Motion Planning with Visuomotor Transformers
- URL: http://arxiv.org/abs/2305.16309v3
- Date: Tue, 17 Oct 2023 16:34:46 GMT
- Title: Imitating Task and Motion Planning with Visuomotor Transformers
- Authors: Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan
Salakhutdinov, Dieter Fox
- Abstract summary: Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations.
In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation.
We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent.
- Score: 71.41938181838124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning is a powerful tool for training robot manipulation
policies, allowing them to learn from expert demonstrations without manual
programming or trial-and-error. However, common methods of data collection,
such as human supervision, scale poorly, as they are time-consuming and
labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously
generate large-scale datasets of diverse demonstrations. In this work, we show
that the combination of large-scale datasets generated by TAMP supervisors and
flexible Transformer models to fit them is a powerful paradigm for robot
manipulation. To that end, we present a novel imitation learning system called
OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a
TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is
specifically curated for imitation learning and can be used to train performant
transformer-based policies. In this paper, we present a thorough study of the
design decisions required to imitate TAMP and demonstrate that OPTIMUS can
solve a wide variety of challenging vision-based manipulation tasks with over
70 different objects, ranging from long-horizon pick-and-place tasks, to shelf
and articulated object manipulation, achieving 70 to 80% success rates. Video
results and code at https://mihdalal.github.io/optimus/
Related papers
- Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers [41.069074375686164]
We propose Heterogeneous Pre-trained Transformers (HPT), which pre-train a trunk of a policy neural network to learn a task and embodiment shared representation.
We conduct experiments to investigate the scaling behaviors of training objectives, to the extent of 52 datasets.
HPTs outperform several baselines and enhance the fine-tuned policy performance by over 20% on unseen tasks.
arXiv Detail & Related papers (2024-09-30T17:39:41Z) - Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks [0.0]
In this work, we focus on unsupervised vision-language--action mapping in the area of robotic manipulation.
We propose a model-invariant training alternative that improves the models' performance in a simulator by up to 55%.
Our work thus also sheds light on the potential benefits and limitations of using the current multimodal VAEs for unsupervised learning of robotic motion trajectories.
arXiv Detail & Related papers (2024-04-02T13:25:16Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability [58.75803543245372]
Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation.
We propose to overcome these limitations by composing diffusion models using a TAMP system.
We show how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning.
arXiv Detail & Related papers (2023-06-22T20:40:24Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - From Play to Policy: Conditional Behavior Generation from Uncurated
Robot Data [18.041329181385414]
Conditional Behavior Transformers (C-BeT) is a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification.
C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%.
We demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data.
arXiv Detail & Related papers (2022-10-18T17:59:55Z) - Transporters with Visual Foresight for Solving Unseen Rearrangement
Tasks [12.604533231243543]
Transporters with Visual Foresight (TVF) is able to achieve multi-task learning and zero-shot generalization to unseen tasks.
TVF is able to improve the performance of a state-of-the-art imitation learning method on both training and unseen tasks in simulation and real robot experiments.
arXiv Detail & Related papers (2022-02-22T09:35:09Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.