Robot Learning with Sensorimotor Pre-training
- URL: http://arxiv.org/abs/2306.10007v2
- Date: Thu, 14 Dec 2023 16:56:39 GMT
- Title: Robot Learning with Sensorimotor Pre-training
- Authors: Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor
Darrell, Jitendra Malik
- Abstract summary: We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
- Score: 98.7755895548928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of
sensorimotor tokens. Given a sequence of camera images, proprioceptive robot
states, and actions, we encode the sequence into tokens, mask out a subset, and
train a model to predict the missing content from the rest. We hypothesize that
if a robot can predict the masked-out content it will have acquired a good
model of the physical world that can enable it to act. RPT is designed to
operate on latent visual representations which makes prediction tractable,
enables scaling to larger models, and allows fast inference on a real robot. To
evaluate our approach, we collected a dataset of 20,000 real-world trajectories
over 9 months using a combination of motion planning and grasping algorithms.
We find that sensorimotor pre-training consistently outperforms training from
scratch, has favorable scaling properties, and enables transfer across
different tasks, environments, and robots.
Related papers
- Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets [24.77850617214567]
We propose a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks.
Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions.
We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss.
arXiv Detail & Related papers (2024-10-29T17:58:13Z) - Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Teaching Robots to Build Simulations of Themselves [7.886658271375681]
We introduce a self-supervised learning framework to enable robots model and predict their morphology, kinematics and motor control using only brief raw video data.
By observing their own movements, robots learn an ability to simulate themselves and predict their spatial motion for various tasks.
arXiv Detail & Related papers (2023-11-20T20:03:34Z) - Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
and Methods [14.780597545674157]
We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives.
We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
arXiv Detail & Related papers (2023-08-07T14:24:52Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Full-Body Visual Self-Modeling of Robot Morphologies [29.76701883250049]
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions.
Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly from task-agnostic interaction data.
Here, we propose that instead of directly modeling forward-kinematics, a more useful form of self-modeling is one that could answer space occupancy queries.
arXiv Detail & Related papers (2021-11-11T18:58:07Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.