RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation
- URL: http://arxiv.org/abs/2410.13979v1
- Date: Thu, 17 Oct 2024 19:14:43 GMT
- Title: RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation
- Authors: Shivam Vats, Devesh K. Jha, Maxim Likhachev, Oliver Kroemer, Diego Romeres,
- Abstract summary: We propose to use hierarchical reinforcement learning to learn a separate recovery policy for a robot.
The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task.
We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines.
- Score: 41.38308130776887
- License:
- Abstract: Model-based planners and controllers are commonly used to solve complex manipulation problems as they can efficiently optimize diverse objectives and generalize to long horizon tasks. However, they are limited by the fidelity of their model which oftentimes leads to failures during deployment. To enable a robot to recover from such failures, we propose to use hierarchical reinforcement learning to learn a separate recovery policy. The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task using the nominal model-based controllers. Our approach, called RecoveryChaining, uses a hybrid action space, where the model-based controllers are provided as additional \emph{nominal} options which allows the recovery policy to decide how to recover, when to switch to a nominal controller and which controller to switch to even with \emph{sparse rewards}. We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines. Finally, we successfully transfer recovery policies learned in simulation to a physical robot to demonstrate the feasibility of sim-to-real transfer with our method.
Related papers
- Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management [0.0]
We propose a novel approach that models recovery behaviors as adaptable robotic skills, leveraging the Behavior Trees and Motion Generators(BTMG) framework for policy representation.
We assess our methodology through a series of progressively challenging scenarios within a peg-in-a-hole task, demonstrating the approach's effectiveness in enhancing operational efficiency and task success rates in collaborative robotics settings.
arXiv Detail & Related papers (2024-04-09T08:56:43Z) - Recover: A Neuro-Symbolic Framework for Failure Detection and Recovery [2.0554045007430672]
This paper introduces Recover, a neuro-symbolic framework for online failure identification and recovery.
By integrating logical rules, and LLM-based planners, Recover exploits symbolic information to enhance the ability of LLMs to generate recovery plans.
arXiv Detail & Related papers (2024-03-31T17:54:22Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous
Manipulation [61.7171775202833]
We introduce an efficient system for learning dexterous manipulation skills withReinforcement learning.
The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping.
Our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy.
arXiv Detail & Related papers (2023-09-06T19:05:31Z) - Efficiently Learning Recoveries from Failures Under Partial
Observability [31.891933360081342]
We present a general approach for robustifying manipulation strategies in a sample-efficient manner.
Our approach incrementally improves robustness by first discovering the failure modes of the current strategy.
We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning.
arXiv Detail & Related papers (2022-09-27T18:00:55Z) - Automating Reinforcement Learning with Example-based Resets [19.86233948960312]
Existing reinforcement learning algorithms assume an episodic setting in which the agent resets to a fixed initial state distribution at the end of each episode.
We propose an extension to conventional reinforcement learning towards greater autonomy by introducing an additional agent that learns to reset in a self-supervised manner.
We apply our method to learn from scratch on a suite of simulated and real-world continuous control tasks and demonstrate that the reset agent successfully learns to reduce manual resets.
arXiv Detail & Related papers (2022-04-05T08:12:42Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z) - Recovery command generation towards automatic recovery in ICT systems by
Seq2Seq learning [11.387419806996599]
We propose a method of estimating recovery commands by using Seq2Seq, a neural network model.
When a new failure occurs, our method estimates plausible commands that recover from the failure on the basis of collected logs.
arXiv Detail & Related papers (2020-03-24T11:34:10Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.