Imitating Task and Motion Planning with Visuomotor Transformers
- URL: http://arxiv.org/abs/2305.16309v3
- Date: Tue, 17 Oct 2023 16:34:46 GMT
- Title: Imitating Task and Motion Planning with Visuomotor Transformers
- Authors: Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan
Salakhutdinov, Dieter Fox
- Abstract summary: Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations.
In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation.
We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent.
- Score: 71.41938181838124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning is a powerful tool for training robot manipulation
policies, allowing them to learn from expert demonstrations without manual
programming or trial-and-error. However, common methods of data collection,
such as human supervision, scale poorly, as they are time-consuming and
labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously
generate large-scale datasets of diverse demonstrations. In this work, we show
that the combination of large-scale datasets generated by TAMP supervisors and
flexible Transformer models to fit them is a powerful paradigm for robot
manipulation. To that end, we present a novel imitation learning system called
OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a
TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is
specifically curated for imitation learning and can be used to train performant
transformer-based policies. In this paper, we present a thorough study of the
design decisions required to imitate TAMP and demonstrate that OPTIMUS can
solve a wide variety of challenging vision-based manipulation tasks with over
70 different objects, ranging from long-horizon pick-and-place tasks, to shelf
and articulated object manipulation, achieving 70 to 80% success rates. Video
results and code at https://mihdalal.github.io/optimus/
Related papers
- Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks [0.0]
In this work, we focus on unsupervised vision-language--action mapping in the area of robotic manipulation.
We propose a model-invariant training alternative that improves the models' performance in a simulator by up to 55%.
Our work thus also sheds light on the potential benefits and limitations of using the current multimodal VAEs for unsupervised learning of robotic motion trajectories.
arXiv Detail & Related papers (2024-04-02T13:25:16Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - From Play to Policy: Conditional Behavior Generation from Uncurated
Robot Data [18.041329181385414]
Conditional Behavior Transformers (C-BeT) is a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification.
C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%.
We demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data.
arXiv Detail & Related papers (2022-10-18T17:59:55Z) - Transporters with Visual Foresight for Solving Unseen Rearrangement
Tasks [12.604533231243543]
Transporters with Visual Foresight (TVF) is able to achieve multi-task learning and zero-shot generalization to unseen tasks.
TVF is able to improve the performance of a state-of-the-art imitation learning method on both training and unseen tasks in simulation and real robot experiments.
arXiv Detail & Related papers (2022-02-22T09:35:09Z) - Error-Aware Imitation Learning from Teleoperation Data for Mobile
Manipulation [54.31414116478024]
In mobile manipulation (MM), robots can both navigate within and interact with their environment.
In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks.
arXiv Detail & Related papers (2021-12-09T23:54:59Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.