Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation
- URL: http://arxiv.org/abs/2406.10615v1
- Date: Sat, 15 Jun 2024 12:27:35 GMT
- Title: Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation
- Authors: Tong Zhang, Yingdong Hu, Jiacheng You, Yang Gao,
- Abstract summary: SGRv2 is an imitation learning framework that enhances sample efficiency through improved visual and action representations.
SGRv2 excels in RLBench tasks with control using merely 5 demonstrations and surpasses the RVT baseline in 23 of 26 tasks.
- Score: 14.990771038350106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the high cost of collecting robotic data in the real world, sample efficiency is a consistently compelling pursuit in robotics. In this paper, we introduce SGRv2, an imitation learning framework that enhances sample efficiency through improved visual and action representations. Central to the design of SGRv2 is the incorporation of a critical inductive bias-action locality, which posits that robot's actions are predominantly influenced by the target object and its interactions with the local environment. Extensive experiments in both simulated and real-world settings demonstrate that action locality is essential for boosting sample efficiency. SGRv2 excels in RLBench tasks with keyframe control using merely 5 demonstrations and surpasses the RVT baseline in 23 of 26 tasks. Furthermore, when evaluated on ManiSkill2 and MimicGen using dense control, SGRv2's success rate is 2.54 times that of SGR. In real-world environments, with only eight demonstrations, SGRv2 can perform a variety of tasks at a markedly higher success rate compared to baseline models. Project website: http://sgrv2-robot.github.io
Related papers
- The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations [13.747258771184372]
There are several open challenges to applying TP-GMMs in the wild.
We factorize the robot's end-effector velocity into its direction and magnitude.
We then segment and sequence skills from complex demonstration trajectories.
Our approach enables learning complex manipulation tasks from just five demonstrations.
arXiv Detail & Related papers (2024-07-18T12:01:09Z) - Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
We study rewards shaped by vision-language models (VLMs) to define dense rewards for robotic learning.
On a real-world manipulation task specified by natural language description, we find that these rewards improve the sample efficiency of autonomous RL.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations [77.31328397965653]
We introduce Ag2Manip (Agent-Agnostic representations for Manipulation), a framework aimed at surmounting challenges through two key innovations.
A novel agent-agnostic visual representation derived from human manipulation videos, with the specifics of embodiments obscured to enhance generalizability.
An agent-agnostic action representation abstracting a robot's kinematics to a universal agent proxy, emphasizing crucial interactions between end-effector and object.
arXiv Detail & Related papers (2024-04-26T16:40:17Z) - ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation [58.615616224739654]
Conventional robotic manipulation methods usually learn semantic representation of the observation for prediction.
We propose a dynamic Gaussian Splatting method named ManiGaussian for multi-temporal robotic manipulation.
Our framework can outperform the state-of-the-art methods by 13.1% in average success rate.
arXiv Detail & Related papers (2024-03-13T08:06:41Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Learning to navigate efficiently and precisely in real environments [14.52507964172957]
Embodied AI literature focuses on end-to-end agents trained in simulators like Habitat or AI-Thor.
In this work we explore end-to-end training of agents in simulation in settings which minimize the sim2real gap.
arXiv Detail & Related papers (2024-01-25T17:50:05Z) - Sample Efficient Robot Learning with Structured World Models [3.1761323820497656]
In game environments, the use of world models has been shown to improve sample efficiency while still achieving good performance.
We compare the use of RGB image observation with a feature space leveraging built-in structure, a common approach in robot skill learning, and compare the impact on task performance and learning efficiency with and without the world model.
arXiv Detail & Related papers (2022-10-21T22:08:55Z) - Metric Residual Networks for Sample Efficient Goal-conditioned
Reinforcement Learning [52.59242013527014]
Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications.
Sample efficiency is of utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal.
We introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture.
arXiv Detail & Related papers (2022-08-17T08:04:41Z) - SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional,
and Incremental Robot Learning [41.19148076789516]
We introduce a systematic learning framework called SAGCI-system towards achieving the above four requirements.
Our system first takes the raw point clouds gathered by the camera mounted on the robot's wrist as the inputs and produces initial modeling of the surrounding environment represented as a URDF.
The robot then utilizes the interactive perception to interact with the environments to online verify and modify the URDF.
arXiv Detail & Related papers (2021-11-29T16:53:49Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.