Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
- URL: http://arxiv.org/abs/2010.15920v2
- Date: Mon, 17 May 2021 21:20:48 GMT
- Title: Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
- Authors: Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo,
Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea
Finn, Ken Goldberg
- Abstract summary: Recovery RL uses offline data to learn about constraint violating zones before policy learning.
We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task.
Results suggest that Recovery RL trades off constraint violations and task successes 2 - 20 times more efficiently in simulation domains and 3 times more efficiently in physical experiments.
- Score: 81.49106778460238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety remains a central obstacle preventing widespread use of RL in the real
world: learning new tasks in uncertain environments requires extensive
exploration, but safety requires limiting exploration. We propose Recovery RL,
an algorithm which navigates this tradeoff by (1) leveraging offline data to
learn about constraint violating zones before policy learning and (2)
separating the goals of improving task performance and constraint satisfaction
across two policies: a task policy that only optimizes the task reward and a
recovery policy that guides the agent to safety when constraint violation is
likely. We evaluate Recovery RL on 6 simulation domains, including two
contact-rich manipulation tasks and an image-based navigation task, and an
image-based obstacle avoidance task on a physical robot. We compare Recovery RL
to 5 prior safe RL methods which jointly optimize for task performance and
safety via constrained optimization or reward shaping and find that Recovery RL
outperforms the next best prior method across all domains. Results suggest that
Recovery RL trades off constraint violations and task successes 2 - 20 times
more efficiently in simulation domains and 3 times more efficiently in physical
experiments. See https://tinyurl.com/rl-recovery for videos and supplementary
material.
Related papers
- Offline Goal-Conditioned Reinforcement Learning for Safety-Critical
Tasks with Recovery Policy [4.854443247023496]
offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset.
We propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals.
arXiv Detail & Related papers (2024-03-04T05:20:57Z) - Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning
in Surgical Robotic Environments [4.2569494803130565]
We introduce an innovative algorithm designed to assign rewards to offline trajectories, using a small number of high-quality expert demonstrations.
This approach circumvents the need for handcrafted rewards, unlocking the potential to harness vast datasets for policy learning.
arXiv Detail & Related papers (2023-10-13T03:39:15Z) - Guided Online Distillation: Promoting Safe Reinforcement Learning by
Offline Demonstration [75.51109230296568]
We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue.
We propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework.
GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.
arXiv Detail & Related papers (2023-09-18T00:22:59Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Constraint-Guided Reinforcement Learning: Augmenting the
Agent-Environment-Interaction [10.203602318836445]
Reinforcement Learning (RL) agents have great successes in solving tasks with large observation and action spaces from limited feedback.
This paper discusses the engineering of reliable agents via the integration of deep RL with constraint-based augmentation models.
Our results show that constraint-guidance does both provide reliability improvements and safer behavior, as well as accelerated training.
arXiv Detail & Related papers (2021-04-24T10:04:14Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.