Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve
Optimality?
- URL: http://arxiv.org/abs/2004.00915v1
- Date: Thu, 2 Apr 2020 10:11:30 GMT
- Title: Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve
Optimality?
- Authors: Sebastien Gros, Mario Zanon, Alberto Bemporad
- Abstract summary: Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy.
Recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set.
This paper addresses this issue in the context of $Q$-learning and policy gradient techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For all its successes, Reinforcement Learning (RL) still struggles to deliver
formal guarantees on the closed-loop behavior of the learned policy. Among
other things, guaranteeing the safety of RL with respect to safety-critical
systems is a very active research topic. Some recent contributions propose to
rely on projections of the inputs delivered by the learned policy into a safe
set, ensuring that the system safety is never jeopardized. Unfortunately, it is
unclear whether this operation can be performed without disrupting the learning
process. This paper addresses this issue. The problem is analysed in the
context of $Q$-learning and policy gradient techniques. We show that the
projection approach is generally disruptive in the context of $Q$-learning
though a simple alternative solves the issue, while simple corrections can be
used in the context of policy gradient methods in order to ensure that the
policy gradients are unbiased. The proposed results extend to safe projections
based on robust MPC techniques.
Related papers
- OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories [5.52395321369933]
This work addresses the problem of offline safe imitation learning (IL)<n>The goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safety cost or reward information.<n>We propose a novel offline safe IL algorithm, OSIL, that infers safety from non-preferred demonstrations.
arXiv Detail & Related papers (2026-02-11T16:41:16Z) - Safety Assessment in Reinforcement Learning via Model Predictive Control [3.244287913152012]
We propose to leverage reversibility as a method for preventing safety issues throughout the training process.<n>Our method uses model-predictive path integral control to check the safety of an action proposed by a learned policy throughout training.
arXiv Detail & Related papers (2025-10-23T19:31:18Z) - A Provable Approach for End-to-End Safe Reinforcement Learning [17.17447653795906]
A longstanding goal in safe reinforcement learning (RL) is to ensure the safety of a policy throughout the entire process.<n>We propose a method, called Provably Lifetime Safe RL (PLS), that integrates offline safe RL with safe policy deployment.
arXiv Detail & Related papers (2025-05-28T00:48:20Z) - Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time.
We present a new, scalable method, which enjoys strict formal guarantees for Safe RL.
We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z) - Conservative Exploration for Policy Optimization via Off-Policy Policy
Evaluation [4.837737516460689]
We study the problem of conservative exploration, where the learner must at least be able to guarantee its performance is at least as good as a baseline policy.
We propose the first conservative provably efficient model-free algorithm for policy optimization in continuous finite-horizon problems.
arXiv Detail & Related papers (2023-12-24T10:59:32Z) - Safe Reinforcement Learning via Probabilistic Logic Shields [14.996708092428447]
We introduce Probabilistic Logic Policy Gradient (PLPG)
PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions.
In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques.
arXiv Detail & Related papers (2023-03-06T15:43:41Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Learn Zero-Constraint-Violation Policy in Model-Free Constrained
Reinforcement Learning [7.138691584246846]
We propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions.
The safety index is designed to increase rapidly for potentially dangerous actions.
We claim that we can learn the energy function in a model-free manner similar to learning a value function.
arXiv Detail & Related papers (2021-11-25T07:24:30Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.