RoboAgent: Generalization and Efficiency in Robot Manipulation via
Semantic Augmentations and Action Chunking
- URL: http://arxiv.org/abs/2309.01918v1
- Date: Tue, 5 Sep 2023 03:14:39 GMT
- Title: RoboAgent: Generalization and Efficiency in Robot Manipulation via
Semantic Augmentations and Action Chunking
- Authors: Homanga Bharadhwaj, Jay Vakil, Mohit Sharma, Abhinav Gupta, Shubham
Tulsiani, Vikash Kumar
- Abstract summary: We develop an efficient system for training universal agents capable of multi-task manipulation skills.
We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks.
On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
- Score: 54.776890150458385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The grand aim of having a single robot that can manipulate arbitrary objects
in diverse settings is at odds with the paucity of robotics datasets. Acquiring
and growing such datasets is strenuous due to manual efforts, operational
costs, and safety challenges. A path toward such an universal agent would
require a structured framework capable of wide generalization but trained
within a reasonable data budget. In this paper, we develop an efficient system
(RoboAgent) for training universal agents capable of multi-task manipulation
skills using (a) semantic augmentations that can rapidly multiply existing
datasets and (b) action representations that can extract performant policies
with small yet diverse multi-modal datasets without overfitting. In addition,
reliable task conditioning and an expressive policy architecture enable our
agent to exhibit a diverse repertoire of skills in novel situations specified
using language commands. Using merely 7500 demonstrations, we are able to train
a single agent capable of 12 unique skills, and demonstrate its generalization
over 38 tasks spread across common daily activities in diverse kitchen scenes.
On average, RoboAgent outperforms prior methods by over 40% in unseen
situations while being more sample efficient and being amenable to capability
improvements and extensions through fine-tuning. Videos at
https://robopen.github.io/
Related papers
- Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance [66.51390591688802]
Value-Guided Policy Steering (V-GPS) is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy.
We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures.
arXiv Detail & Related papers (2024-10-17T17:46:26Z) - Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning [35.42091835421386]
Multimodal task specification is essential for enhanced robotic performance.
We show that by leveraging unimodal instructions abundant in real data, we can effectively teach robots to learn multimodal task specifications.
arXiv Detail & Related papers (2024-10-02T13:23:02Z) - Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation [49.03165169369552]
By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets.
We propose CrossFormer, a scalable and flexible transformer-based policy that can consume data from any embodiment.
We demonstrate that the same network weights can control vastly different robots, including single and dual arm manipulation systems, wheeled robots, quadcopters, and quadrupeds.
arXiv Detail & Related papers (2024-08-21T17:57:51Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation [33.10577695383743]
We propose a multi-embodiment, multi-task generalist agent for robotic manipulation called RoboCat.
This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions.
With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples.
arXiv Detail & Related papers (2023-06-20T17:35:20Z) - Learning Multi-Arm Manipulation Through Collaborative Teleoperation [63.35924708783826]
Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks.
Many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk.
We present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms.
arXiv Detail & Related papers (2020-12-12T05:43:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.