Efficiently Learning Recoveries from Failures Under Partial
Observability
- URL: http://arxiv.org/abs/2209.13605v1
- Date: Tue, 27 Sep 2022 18:00:55 GMT
- Title: Efficiently Learning Recoveries from Failures Under Partial
Observability
- Authors: Shivam Vats, Maxim Likhachev, Oliver Kroemer
- Abstract summary: We present a general approach for robustifying manipulation strategies in a sample-efficient manner.
Our approach incrementally improves robustness by first discovering the failure modes of the current strategy.
We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning.
- Score: 31.891933360081342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Operating under real world conditions is challenging due to the possibility
of a wide range of failures induced by partial observability. In relatively
benign settings, such failures can be overcome by retrying or executing one of
a small number of hand-engineered recovery strategies. By contrast,
contact-rich sequential manipulation tasks, like opening doors and assembling
furniture, are not amenable to exhaustive hand-engineering. To address this
issue, we present a general approach for robustifying manipulation strategies
in a sample-efficient manner. Our approach incrementally improves robustness by
first discovering the failure modes of the current strategy via exploration in
simulation and then learning additional recovery skills to handle these
failures. To ensure efficient learning, we propose an online algorithm Value
Upper Confidence Limit (Value-UCL) that selects what failure modes to
prioritize and which state to recover to such that the expected performance
improves maximally in every training episode. We use our approach to learn
recovery skills for door-opening and evaluate them both in simulation and on a
real robot with little fine-tuning. Compared to open-loop execution, our
experiments show that even a limited amount of recovery learning improves task
success substantially from 71\% to 92.4\% in simulation and from 75\% to 90\%
on a real robot.
Related papers
- RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation [41.38308130776887]
We propose to use hierarchical reinforcement learning to learn a separate recovery policy for a robot.
The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task.
We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines.
arXiv Detail & Related papers (2024-10-17T19:14:43Z) - FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management [0.0]
We propose a novel approach that models recovery behaviors as adaptable robotic skills, leveraging the Behavior Trees and Motion Generators(BTMG) framework for policy representation.
We assess our methodology through a series of progressively challenging scenarios within a peg-in-a-hole task, demonstrating the approach's effectiveness in enhancing operational efficiency and task success rates in collaborative robotics settings.
arXiv Detail & Related papers (2024-04-09T08:56:43Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Asking for Help: Failure Prediction in Behavioral Cloning through Value
Approximation [8.993237527071756]
We introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy.
We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening.
arXiv Detail & Related papers (2023-02-08T20:56:23Z) - Anchored Learning for On-the-Fly Adaptation -- Extended Technical Report [45.123633153460034]
This study presents "anchor critics", a novel strategy for enhancing the robustness of reinforcement learning (RL) agents in crossing the sim-to-real gap.
We identify that naive fine-tuning approaches lead to catastrophic forgetting, where policies maintain high rewards on frequently encountered states but lose performance on rarer, yet critical scenarios.
Evaluations demonstrate that our approach enables behavior retention in sim-to-sim gymnasium tasks and in sim-to-real scenarios with racing quadrotors, achieving a near-50% reduction in power consumption while maintaining controllable, stable flight.
arXiv Detail & Related papers (2023-01-17T16:16:53Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.