RoboAgent: Generalization and Efficiency in Robot Manipulation via
Semantic Augmentations and Action Chunking
- URL: http://arxiv.org/abs/2309.01918v1
- Date: Tue, 5 Sep 2023 03:14:39 GMT
- Title: RoboAgent: Generalization and Efficiency in Robot Manipulation via
Semantic Augmentations and Action Chunking
- Authors: Homanga Bharadhwaj, Jay Vakil, Mohit Sharma, Abhinav Gupta, Shubham
Tulsiani, Vikash Kumar
- Abstract summary: We develop an efficient system for training universal agents capable of multi-task manipulation skills.
We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks.
On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
- Score: 54.776890150458385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The grand aim of having a single robot that can manipulate arbitrary objects
in diverse settings is at odds with the paucity of robotics datasets. Acquiring
and growing such datasets is strenuous due to manual efforts, operational
costs, and safety challenges. A path toward such an universal agent would
require a structured framework capable of wide generalization but trained
within a reasonable data budget. In this paper, we develop an efficient system
(RoboAgent) for training universal agents capable of multi-task manipulation
skills using (a) semantic augmentations that can rapidly multiply existing
datasets and (b) action representations that can extract performant policies
with small yet diverse multi-modal datasets without overfitting. In addition,
reliable task conditioning and an expressive policy architecture enable our
agent to exhibit a diverse repertoire of skills in novel situations specified
using language commands. Using merely 7500 demonstrations, we are able to train
a single agent capable of 12 unique skills, and demonstrate its generalization
over 38 tasks spread across common daily activities in diverse kitchen scenes.
On average, RoboAgent outperforms prior methods by over 40% in unseen
situations while being more sample efficient and being amenable to capability
improvements and extensions through fine-tuning. Videos at
https://robopen.github.io/
Related papers
- An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Multi-task real-robot data with gaze attention for dual-arm fine manipulation [4.717749411286867]
This paper introduces a dataset of diverse object manipulations that includes dual-arm tasks and/or tasks requiring fine manipulation.
We have generated dataset with 224k episodes (150 hours, 1,104 language instructions) which includes dual-arm fine tasks such as bowl-moving, pencil-case opening or banana-peeling.
This dataset includes visual attention signals as well as dual-action labels, a signal that separates actions into a robust reaching trajectory and precise interaction with objects, and language instructions to achieve robust and precise object manipulation.
arXiv Detail & Related papers (2024-01-15T11:20:34Z) - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z) - MimicGen: A Data Generation System for Scalable Robot Learning using
Human Demonstrations [55.549956643032836]
MimicGen is a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations.
We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks.
arXiv Detail & Related papers (2023-10-26T17:17:31Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation [33.10577695383743]
We propose a multi-embodiment, multi-task generalist agent for robotic manipulation called RoboCat.
This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions.
With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples.
arXiv Detail & Related papers (2023-06-20T17:35:20Z) - CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation
Learning [33.88636835443266]
We propose a framework to better scale up robot learning under the lens of multi-task, multi-scene robot manipulation in kitchen environments.
Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training.
In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage.
arXiv Detail & Related papers (2022-12-12T05:30:08Z) - Learning Multi-Arm Manipulation Through Collaborative Teleoperation [63.35924708783826]
Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks.
Many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk.
We present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms.
arXiv Detail & Related papers (2020-12-12T05:43:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.