Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations
- URL: http://arxiv.org/abs/2108.01846v1
- Date: Wed, 4 Aug 2021 04:59:05 GMT
- Title: Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations
- Authors: Yuping Luo, Tengyu Ma
- Abstract summary: This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
- Score: 64.39401322671803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training-time safety violations have been a major concern when we deploy
reinforcement learning algorithms in the real world. This paper explores the
possibility of safe RL algorithms with zero training-time safety violations in
the challenging setting where we are only given a safe but trivial-reward
initial policy without any prior knowledge of the dynamics model and additional
offline data. We propose an algorithm, Co-trained Barrier Certificate for Safe
RL (CRABS), which iteratively learns barrier certificates, dynamics models, and
policies. The barrier certificates, learned via adversarial training, ensure
the policy's safety assuming calibrated learned dynamics model. We also add a
regularization term to encourage larger certified regions to enable better
exploration. Empirical simulations show that zero safety violations are already
challenging for a suite of simple environments with only 2-4 dimensional state
space, especially if high-reward policies have to visit regions near the safety
boundary. Prior methods require hundreds of violations to achieve decent
rewards on these tasks, whereas our proposed algorithms incur zero violations.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Reinforcement Learning with Ensemble Model Predictive Safety
Certification [2.658598582858331]
unsupervised exploration prevents the deployment of reinforcement learning algorithms on safety-critical tasks.
We propose a new algorithm that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent.
Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.
arXiv Detail & Related papers (2024-02-06T17:42:39Z) - Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery [13.333197887318168]
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks.
We propose a method to construct a boundary that discriminates safe and unsafe states.
Our approach has better task performance with less safety violations than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-06-24T12:02:50Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.