Related papers: Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

URL: http://arxiv.org/abs/2308.03620v1
Date: Mon, 7 Aug 2023 14:24:52 GMT
Title: Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Authors: Ya Jing, Xuelin Zhu, Xingbin Liu, Qie Sima, Taozheng Yang, Yunhai Feng, Tao Kong
Abstract summary: We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives. We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
Score: 14.780597545674157
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual pre-training with large-scale real-world data has made great progress in recent years, showing great potential in robot learning with pixel observations. However, the recipes of visual pre-training for robot manipulation tasks are yet to be built. In this paper, we thoroughly investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives: pre-training datasets, model architectures and training methods. Several significant experimental findings are provided that are beneficial for robot learning. Further, we propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning. Concretely, the former employs contrastive learning to acquire underlying patterns from large-scale unlabeled data, while the latter aims learning visual semantics and temporal dynamics. Extensive experiments on robot manipulations in various simulation environments and the real robot demonstrate the superiority of the proposed scheme. Videos and more details can be found on \url{https://explore-pretrain-robot.github.io}.

Related papers

AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning [5.371855090716962]
Visual Robot Manipulation (VRM) aims to enable a robot to follow natural language instructions based on robot states and visual observations.<n>Existing approaches have employed vision-language pretraining with large-scale data.<n>We propose to learn from large-scale human action video datasets in an explicit way.
arXiv Detail & Related papers (2025-08-11T05:09:58Z)
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets [24.77850617214567]
We propose a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks. Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions. We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss.
arXiv Detail & Related papers (2024-10-29T17:58:13Z)
KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data [45.25288643161976]
We propose Keypoint Affordance Learning from Imagined Environments (KALIE) for robotic control in a scalable manner. Instead of directly producing motor commands, KALIE controls the robot by predicting point-based affordance representations. We demonstrate that KALIE can learn to robustly solve new manipulation tasks with unseen objects given only 50 example data points.
arXiv Detail & Related papers (2024-09-21T08:45:16Z)
Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction. The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks. One of the key contributing factors to this progress is the scale of robot data used to train the models. We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
Continual Learning from Demonstration of Robotics Skills [5.573543601558405]
Methods for teaching motion skills to robots focus on training for a single skill at a time. We propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers.
arXiv Detail & Related papers (2022-02-14T16:26:52Z)
A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works. However, learning a model that captures the dynamics of complex skills represents a major challenge. We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.