Safe Reinforcement Learning via Probabilistic Logic Shields
- URL: http://arxiv.org/abs/2303.03226v1
- Date: Mon, 6 Mar 2023 15:43:41 GMT
- Title: Safe Reinforcement Learning via Probabilistic Logic Shields
- Authors: Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt
- Abstract summary: We introduce Probabilistic Logic Policy Gradient (PLPG)
PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions.
In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques.
- Score: 14.996708092428447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safe Reinforcement learning (Safe RL) aims at learning optimal policies while
staying safe. A popular solution to Safe RL is shielding, which uses a logical
safety specification to prevent an RL agent from taking unsafe actions.
However, traditional shielding techniques are difficult to integrate with
continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic
Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses
probabilistic logic programming to model logical safety constraints as
differentiable functions. Therefore, PLPG can be seamlessly applied to any
policy gradient algorithm while still providing the same convergence
guarantees. In our experiments, we show that PLPG learns safer and more
rewarding policies compared to other state-of-the-art shielding techniques.
Related papers
- Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models [57.006252510102506]
Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but often lacks provable guarantees for safety-critical applications.<n>We introduce a novel recovery-based shielding framework that enables safe RL with a provable safety lower bound for unknown and non-linear continuous dynamical systems.
arXiv Detail & Related papers (2026-02-12T22:03:35Z) - A Provable Approach for End-to-End Safe Reinforcement Learning [17.17447653795906]
A longstanding goal in safe reinforcement learning (RL) is to ensure the safety of a policy throughout the entire process.<n>We propose a method, called Provably Lifetime Safe RL (PLS), that integrates offline safe RL with safe policy deployment.
arXiv Detail & Related papers (2025-05-28T00:48:20Z) - Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time.
We present a new, scalable method, which enjoys strict formal guarantees for Safe RL.
We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z) - Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning [7.349727826230864]
We present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents.
The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function.
We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95% pm 9%$ cumulative reward.
arXiv Detail & Related papers (2024-05-04T20:59:06Z) - Leveraging Approximate Model-based Shielding for Probabilistic Safety
Guarantees in Continuous Environments [63.053364805943026]
We extend the approximate model-based shielding framework to the continuous setting.
In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms.
arXiv Detail & Related papers (2024-02-01T17:55:08Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z) - Model-Based Actor-Critic with Chance Constraint for Stochastic System [6.600423613245076]
We propose a model-based chance constrained actor-critic (CCAC) algorithm which can efficiently learn a safe and non-conservative policy.
CCAC directly solves the original chance constrained problems, where the objective function and safe probability is simultaneously optimized with adaptive weights.
arXiv Detail & Related papers (2020-12-19T15:46:50Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.