SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition
- URL: http://arxiv.org/abs/2202.04849v1
- Date: Thu, 10 Feb 2022 05:43:41 GMT
- Title: SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition
- Authors: Dylan Slack, Yinlam Chow, Bo Dai, and Nevan Wichers
- Abstract summary: We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
- Score: 59.94644674087599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though many reinforcement learning (RL) problems involve learning policies in
settings with difficult-to-specify safety constraints and sparse rewards,
current methods struggle to acquire successful and safe policies. Methods that
extract useful policy primitives from offline datasets using generative
modeling have recently shown promise at accelerating RL in these more complex
settings. However, we discover that current primitive-learning methods may not
be well-equipped for safe policy learning and may promote unsafe behavior due
to their tendency to ignore data from undesirable behaviors. To overcome these
issues, we propose SAFEty skill pRiors (SAFER), an algorithm that accelerates
policy learning on complex control tasks under safety constraints. Through
principled training on an offline dataset, SAFER learns to extract safe
primitive skills. In the inference stage, policies trained with SAFER learn to
compose safe skills into successful policies. We theoretically characterize why
SAFER can enforce safe policy learning and demonstrate its effectiveness on
several complex safety-critical robotic grasping tasks inspired by the game
Operation, in which SAFER outperforms baseline methods in learning successful
policies and enforcing safety.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Guided Online Distillation: Promoting Safe Reinforcement Learning by
Offline Demonstration [75.51109230296568]
We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue.
We propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework.
GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.
arXiv Detail & Related papers (2023-09-18T00:22:59Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.