Related papers: Efficiently Learning Recoveries from Failures Under Partial Observability

Efficiently Learning Recoveries from Failures Under Partial Observability

URL: http://arxiv.org/abs/2209.13605v1
Date: Tue, 27 Sep 2022 18:00:55 GMT
Title: Efficiently Learning Recoveries from Failures Under Partial Observability
Authors: Shivam Vats, Maxim Likhachev, Oliver Kroemer
Abstract summary: We present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning.
Score: 31.891933360081342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by partial observability. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm Value Upper Confidence Limit (Value-UCL) that selects what failure modes to prioritize and which state to recover to such that the expected performance improves maximally in every training episode. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71\% to 92.4\% in simulation and from 75\% to 90\% on a real robot.

Related papers

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents [43.806220882212386]
RLVMR integrates dense, process-level supervision into end-to-end RL by rewarding verifiable, meta-reasoning behaviors.<n>On the challenging ALFWorld and ScienceWorld benchmarks, RLVMR achieves new state-of-the-art results.
arXiv Detail & Related papers (2025-07-30T17:00:48Z)
Exploring Expert Failures Improves LLM Agent Tuning [74.0772570556016]
We propose Exploring Expert Failures (EEF), which identifies beneficial actions from failed expert trajectories. EEF successfully solves some previously unsolvable subtasks and improves agent tuning performance.
arXiv Detail & Related papers (2025-04-17T17:53:54Z)
Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks [6.991281327290525]
We introduce a novel replay strategy, "Next-Future", which focuses on rewarding single-step transitions. This approach significantly enhances sample efficiency and accuracy in learning multi-goal Markov decision processes.
arXiv Detail & Related papers (2025-04-15T14:45:51Z)
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids [61.033745979145536]
This work investigates the key challenges in applying reinforcement learning to solve a collection of contact-rich manipulation tasks on a humanoid embodiment. Our main contributions include an automated real-to-sim tuning module that brings the simulated environment closer to the real world. We show promising results on three humanoid dexterous manipulation tasks, with ablation studies on each technique.
arXiv Detail & Related papers (2025-02-27T18:59:52Z)
RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation [41.38308130776887]
We propose to use hierarchical reinforcement learning to learn a separate recovery policy for a robot. The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task. We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines.
arXiv Detail & Related papers (2024-10-17T19:14:43Z)
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques. Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z)
Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA) Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning. We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management [0.0]
We propose a novel approach that models recovery behaviors as adaptable robotic skills, leveraging the Behavior Trees and Motion Generators(BTMG) framework for policy representation. We assess our methodology through a series of progressively challenging scenarios within a peg-in-a-hole task, demonstrating the approach's effectiveness in enhancing operational efficiency and task success rates in collaborative robotics settings.
arXiv Detail & Related papers (2024-04-09T08:56:43Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
Asking for Help: Failure Prediction in Behavioral Cloning through Value Approximation [8.993237527071756]
We introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy. We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening.
arXiv Detail & Related papers (2023-02-08T20:56:23Z)
Anchored Learning for On-the-Fly Adaptation -- Extended Technical Report [45.123633153460034]
This study presents "anchor critics", a novel strategy for enhancing the robustness of reinforcement learning (RL) agents in crossing the sim-to-real gap. We identify that naive fine-tuning approaches lead to catastrophic forgetting, where policies maintain high rewards on frequently encountered states but lose performance on rarer, yet critical scenarios. Evaluations demonstrate that our approach enables behavior retention in sim-to-sim gymnasium tasks and in sim-to-real scenarios with racing quadrotors, achieving a near-50% reduction in power consumption while maintaining controllable, stable flight.
arXiv Detail & Related papers (2023-01-17T16:16:53Z)
Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states. VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.