ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning
- URL: http://arxiv.org/abs/2512.16861v1
- Date: Thu, 18 Dec 2025 18:32:39 GMT
- Title: ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning
- Authors: Zihan Zhou, Animesh Garg, Ajay Mandlekar, Caelan Garrett,
- Abstract summary: ReinforceGen is a system that combines task decomposition, data generation, imitation learning, and motion planning.<n>It reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting.<n>Additional ablation studies show that our fine-tuning approaches contribute to an 89% average performance increase.
- Score: 32.137320056371784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/
Related papers
- FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation [60.28409233931666]
We introduce FieldGen, a field-guided data generation framework that enables scalable, diverse, and high-quality real-world data collection.<n>Experiments demonstrate that policies trained with FieldGen achieve higher success rates and improved stability compared to teleoperation-based baselines.
arXiv Detail & Related papers (2025-10-23T17:47:12Z) - RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies.<n>We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations.<n>Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
arXiv Detail & Related papers (2024-12-13T04:57:55Z) - SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment [33.53559296053225]
We propose SkillMimicGen, an automated system for generating demonstration datasets from a few human demos.
SkillGen segments human demos into manipulation skills, adapts these skills to new contexts, and stitches them together through free-space transit and transfer motion.
We demonstrate the efficacy of SkillGen by generating over 24K demonstrations across 18 task variants in simulation from just 60 human demonstrations.
arXiv Detail & Related papers (2024-10-24T16:59:26Z) - SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation [58.14969377419633]
We propose spire, a system that decomposes tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths.
We find that spire outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance.
arXiv Detail & Related papers (2024-10-23T17:42:07Z) - FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning
During Deployment [25.186525630548356]
Sirius is a principled framework for humans and robots to collaborate through a division of work.
Partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably.
We introduce a new learning algorithm to improve the policy's performance on the data collected from the task executions.
arXiv Detail & Related papers (2022-11-15T18:53:39Z) - Reinforcement Learning Experiments and Benchmark for Solving Robotic
Reaching Tasks [0.0]
Reinforcement learning has been successfully applied to solving the reaching task with robotic arms.
It is shown that augmenting the reward signal with the Hindsight Experience Replay exploration technique increases the average return of off-policy agents.
arXiv Detail & Related papers (2020-11-11T14:00:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.