Related papers: RVT-2: Learning Precise Manipulation from Few Demonstrations

RVT-2: Learning Precise Manipulation from Few Demonstrations

URL: http://arxiv.org/abs/2406.08545v1
Date: Wed, 12 Jun 2024 18:00:01 GMT
Title: RVT-2: Learning Precise Manipulation from Few Demonstrations
Authors: Ankit Goyal, Valts Blukis, Jie Xu, Yijie Guo, Yu-Wei Chao, Dieter Fox,
Abstract summary: RVT-2 is a multitask 3D manipulation model that is 6X faster in training and 2X faster in inference than its predecessor RVT. It achieves a new state-of-the-art on RLBench, improving the success rate from 65% to 82%. RVT-2 is also effective in the real world, where it can learn tasks requiring high precision, like picking up and inserting plugs, with just 10 demonstrations.
Score: 43.48649783097065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given language instructions. To be useful in industrial and household domains, such a system should be capable of learning new tasks with few demonstrations and solving them precisely. Prior works, like PerAct and RVT, have studied this problem, however, they often struggle with tasks requiring high precision. We study how to make them more effective, precise, and fast. Using a combination of architectural and system-level improvements, we propose RVT-2, a multitask 3D manipulation model that is 6X faster in training and 2X faster in inference than its predecessor RVT. RVT-2 achieves a new state-of-the-art on RLBench, improving the success rate from 65% to 82%. RVT-2 is also effective in the real world, where it can learn tasks requiring high precision, like picking up and inserting plugs, with just 10 demonstrations. Visual results, code, and trained model are provided at: https://robotic-view-transformer-2.github.io/.

Related papers

3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning [2.6670748466660523]
Vision-language models (VLMs) have achieved remarkable success in scene understanding and perception tasks. VLMs lack robust 3D scene localization capabilities, limiting their effectiveness in fine-grained robotic operations. We propose a novel framework that integrates a 2D prompt synthesis module by mapping 2D images to point clouds, and incorporates a small language model (SLM) for supervising VLM outputs.
arXiv Detail & Related papers (2025-02-13T02:40:19Z)
Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z)
RVT: Robotic View Transformer for 3D Object Manipulation [46.25268237442356]
We propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. A single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct)
arXiv Detail & Related papers (2023-06-26T17:59:31Z)
VIMA: General Robot Manipulation with Multimodal Prompts [82.01214865117637]
We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts. We develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively.
arXiv Detail & Related papers (2022-10-06T17:50:11Z)
Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum Learning Study [4.045850174820418]
This paper presents a study for accelerating robot learning of contact-rich manipulation tasks based on Curriculum Learning combined with Domain Randomization (DR) We tackle complex industrial assembly tasks with position-controlled robots, such as insertion tasks. Results also show that even when training only in simulation with toy tasks, our method can learn policies that can be transferred to the real-world robot.
arXiv Detail & Related papers (2022-04-27T11:08:39Z)
R3M: A Universal Visual Representation for Robot Manipulation [91.55543664116209]
We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of robotic manipulation tasks. We find that R3M improves task success by over 20% compared to training from scratch and by over 10% compared to state-of-the-art visual representations like CLIP and MoCo.
arXiv Detail & Related papers (2022-03-23T17:55:09Z)
Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks [12.604533231243543]
Transporters with Visual Foresight (TVF) is able to achieve multi-task learning and zero-shot generalization to unseen tasks. TVF is able to improve the performance of a state-of-the-art imitation learning method on both training and unseen tasks in simulation and real robot experiments.
arXiv Detail & Related papers (2022-02-22T09:35:09Z)
Assembly robots with optimized control stiffness through reinforcement learning [3.4410212782758047]
We propose a methodology that uses reinforcement learning to achieve high performance in robots. The proposed method ensures the online generation of stiffness matrices that help improve the performance of local trajectory optimization. The effectiveness of the method was verified via experiments involving two contact-rich tasks.
arXiv Detail & Related papers (2020-02-27T15:54:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.