Safe Reinforcement Learning via Confidence-Based Filters
- URL: http://arxiv.org/abs/2207.01337v1
- Date: Mon, 4 Jul 2022 11:43:23 GMT
- Title: Safe Reinforcement Learning via Confidence-Based Filters
- Authors: Sebastian Curi, Armin Lederer, Sandra Hirche, Andreas Krause
- Abstract summary: We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
- Score: 78.39359694273575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring safety is a crucial challenge when deploying reinforcement learning
(RL) to real-world systems. We develop confidence-based safety filters, a
control-theoretic approach for certifying state safety constraints for nominal
policies learned via standard RL techniques, based on probabilistic dynamics
models. Our approach is based on a reformulation of state constraints in terms
of cost functions, reducing safety verification to a standard RL task. By
exploiting the concept of hallucinating inputs, we extend this formulation to
determine a "backup" policy that is safe for the unknown system with high
probability. Finally, the nominal policy is minimally adjusted at every time
step during a roll-out towards the backup policy, such that safe recovery can
be guaranteed afterwards. We provide formal safety guarantees, and empirically
demonstrate the effectiveness of our approach.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems [2.126171264016785]
We propose Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration.
RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state.
In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL.
arXiv Detail & Related papers (2024-04-23T16:35:14Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Joint Synthesis of Safety Certificate and Safe Control Policy using
Constrained Reinforcement Learning [7.658716383823426]
A valid safety certificate is an energy function indicating that safe states are with low energy.
Existing learning-based studies treat the safety certificate and the safe control policy as prior knowledge to learn the other.
This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL.
arXiv Detail & Related papers (2021-11-15T12:05:44Z) - Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring
Statewise Safety [1.9573380763700712]
We introduce the feasible actor-critic (FAC) algorithm, which is the first model-free constrained safe reinforcement learning method.
We claim that some states are inherently unsafe no matter what policy we choose, while for other states there exist policies ensuring safety, where we say such states and policies are feasible.
We provide theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization.
arXiv Detail & Related papers (2021-05-22T10:40:58Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.