CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation
Learning
- URL: http://arxiv.org/abs/2212.05711v1
- Date: Mon, 12 Dec 2022 05:30:08 GMT
- Title: CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation
Learning
- Authors: Zhao Mandi, Homanga Bharadhwaj, Vincent Moens, Shuran Song, Aravind
Rajeswaran, Vikash Kumar
- Abstract summary: We propose a framework to better scale up robot learning under the lens of multi-task, multi-scene robot manipulation in kitchen environments.
Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training.
In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage.
- Score: 33.88636835443266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing robots that are capable of many skills and generalization to
unseen scenarios requires progress on two fronts: efficient collection of large
and diverse datasets, and training of high-capacity policies on the collected
data. While large datasets have propelled progress in other fields like
computer vision and natural language processing, collecting data of comparable
scale is particularly challenging for physical systems like robotics. In this
work, we propose a framework to bridge this gap and better scale up robot
learning, under the lens of multi-task, multi-scene robot manipulation in
kitchen environments. Our framework, named CACTI, has four stages that
separately handle data collection, data augmentation, visual representation
learning, and imitation policy training. In the CACTI framework, we highlight
the benefit of adapting state-of-the-art models for image generation as part of
the augmentation stage, and the significant improvement of training efficiency
by using pretrained out-of-domain visual representations at the compression
stage. Experimentally, we demonstrate that 1) on a real robot setup, CACTI
enables efficient training of a single policy capable of 10 manipulation tasks
involving kitchen objects, and robust to varying layouts of distractor objects;
2) in a simulated kitchen environment, CACTI trains a single policy on 18
semantic tasks across up to 50 layout variations per task. The simulation task
benchmark and augmented datasets in both real and simulated environments will
be released to facilitate future research.
Related papers
- Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers [41.069074375686164]
We propose Heterogeneous Pre-trained Transformers (HPT), which pre-train a trunk of a policy neural network to learn a task and embodiment shared representation.
We conduct experiments to investigate the scaling behaviors of training objectives, to the extent of 52 datasets.
HPTs outperform several baselines and enhance the fine-tuned policy performance by over 20% on unseen tasks.
arXiv Detail & Related papers (2024-09-30T17:39:41Z) - VITAL: Visual Teleoperation to Enhance Robot Learning through Human-in-the-Loop Corrections [10.49712834719005]
We propose a low-cost visual teleoperation system for bimanual manipulation tasks, called VITAL.
Our approach leverages affordable hardware and visual processing techniques to collect demonstrations.
We enhance the generalizability and robustness of the learned policies by utilizing both real and simulated environments.
arXiv Detail & Related papers (2024-07-30T23:29:47Z) - Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks [0.0]
In this work, we focus on unsupervised vision-language--action mapping in the area of robotic manipulation.
We propose a model-invariant training alternative that improves the models' performance in a simulator by up to 55%.
Our work thus also sheds light on the potential benefits and limitations of using the current multimodal VAEs for unsupervised learning of robotic motion trajectories.
arXiv Detail & Related papers (2024-04-02T13:25:16Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Polybot: Training One Policy Across Robots While Embracing Variability [70.74462430582163]
We propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms.
Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras.
We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes.
arXiv Detail & Related papers (2023-07-07T17:21:16Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.