Related papers: SIME: Enhancing Policy Self-Improvement with Modal-level Exploration

SIME: Enhancing Policy Self-Improvement with Modal-level Exploration

URL: http://arxiv.org/abs/2505.01396v1
Date: Fri, 02 May 2025 17:13:03 GMT
Title: SIME: Enhancing Policy Self-Improvement with Modal-level Exploration
Authors: Yang Jin, Jun Lv, Wenye Yu, Hongjie Fang, Yong-Lu Li, Cewu Lu,
Abstract summary: We identify the key to successful self-improvement: modal-level exploration and data selection.<n>By incorporating a modal-level exploration mechanism during policy execution, the robot can produce more diverse and multi-modal interactions.<n>We successfully demonstrate effective robot self-improvement on both simulation benchmarks and real-world experiments.
Score: 49.86173021151849
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-improvement requires robotic systems to initially learn from human-provided data and then gradually enhance their capabilities through interaction with the environment. This is similar to how humans improve their skills through continuous practice. However, achieving effective self-improvement is challenging, primarily because robots tend to repeat their existing abilities during interactions, often failing to generate new, valuable data for learning. In this paper, we identify the key to successful self-improvement: modal-level exploration and data selection. By incorporating a modal-level exploration mechanism during policy execution, the robot can produce more diverse and multi-modal interactions. At the same time, we select the most valuable trials and high-quality segments from these interactions for learning. We successfully demonstrate effective robot self-improvement on both simulation benchmarks and real-world experiments. The capability for self-improvement will enable us to develop more robust and high-success-rate robotic control strategies at a lower cost. Our code and experiment scripts are available at https://ericjin2002.github.io/SIME/

Related papers

Simulation-Aided Policy Tuning for Black-Box Robot Learning [47.83474891747279]
We present a novel black-box policy search algorithm focused on data-efficient policy improvements. The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process. We show fast and successful task learning on a robot manipulator with the aid of an imperfect simulator.
arXiv Detail & Related papers (2024-11-21T15:52:23Z)
Generalized Robot Learning Framework [10.03174544844559]
We present a low-cost robot learning framework that is both easily reproducible and transferable to various robots and environments. We demonstrate that deployable imitation learning can be successfully applied even to industrial-grade robots.
arXiv Detail & Related papers (2024-09-18T15:34:31Z)
Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities. We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z)
Cognitive architecture aided by working-memory for self-supervised multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions. Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task. One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z)
Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection. We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted. In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.