Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation
- URL: http://arxiv.org/abs/2601.07821v1
- Date: Mon, 12 Jan 2026 18:53:11 GMT
- Title: Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation
- Authors: Huanyu Li, Kun Lei, Sheng Zang, Kaizhe Hu, Yongyuan Liang, Bo An, Xiaoli Li, Huazhe Xu,
- Abstract summary: Failure-Aware Offline-to-Online Reinforcement Learning (FARL) is a new paradigm minimizing failures during real-world reinforcement learning.<n>We propose an algorithm that integrates a world-model-based safety critic and a recovery policy trained offline to prevent failures during online exploration.
- Score: 48.26705293834693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Post-training algorithms based on deep reinforcement learning can push the limits of robotic models for specific objectives, such as generalizability, accuracy, and robustness. However, Intervention-requiring Failures (IR Failures) (e.g., a robot spilling water or breaking fragile glass) during real-world exploration happen inevitably, hindering the practical deployment of such a paradigm. To tackle this, we introduce Failure-Aware Offline-to-Online Reinforcement Learning (FARL), a new paradigm minimizing failures during real-world reinforcement learning. We create FailureBench, a benchmark that incorporates common failure scenarios requiring human intervention, and propose an algorithm that integrates a world-model-based safety critic and a recovery policy trained offline to prevent failures during online exploration. Extensive simulation and real-world experiments demonstrate the effectiveness of FARL in significantly reducing IR Failures while improving performance and generalization during online reinforcement learning post-training. FARL reduces IR Failures by 73.1% while elevating performance by 11.3% on average during real-world RL post-training. Videos and code are available at https://failure-aware-rl.github.io.
Related papers
- WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL [30.884160045861616]
We propose WoVR, a reliable world-model-based reinforcement learning framework for post-training VLA policies.<n>It improves rollout stability through a controllable action-conditioned video world model.<n>It also reshapes imagined interaction to reduce effective error depth via Keyframe-evolutiond Rollouts.
arXiv Detail & Related papers (2026-02-15T03:48:20Z) - Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z) - Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap.<n>We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z) - Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations [4.849820402342814]
offline reinforcement learning is particularly promising for robot control applications.<n> robustness against real-world challenges, such as joint actuator faults in robots, remains a critical concern.<n>This study evaluates the robustness of existing offline reinforcement learning methods using legged robots from OpenAI Gym.
arXiv Detail & Related papers (2024-12-25T05:02:22Z) - Augmenting Replay in World Models for Continual Reinforcement Learning [0.0]
Continual RL requires an agent to learn new tasks without forgetting previous ones, while improving on both past and future tasks.
The most common approaches use model-free algorithms and replay buffers to mitigate catastrophic forgetting.
We introduce WMAR (World Models with Augmented Replay), a model-based RL algorithm with a memory-efficient replay buffer.
arXiv Detail & Related papers (2024-01-30T00:48:26Z) - Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning [93.99377042564919]
This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies.
We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
arXiv Detail & Related papers (2023-05-24T15:45:35Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - CLAMGen: Closed-Loop Arm Motion Generation via Multi-view Vision-Based
RL [4.014524824655106]
We propose a vision-based reinforcement learning (RL) approach for closed-loop trajectory generation in an arm reaching problem.
Arm trajectory generation is a fundamental robotics problem which entails finding collision-free paths to move the robot's body.
arXiv Detail & Related papers (2021-03-24T15:33:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.