ProSh: Probabilistic Shielding for Model-free Reinforcement Learning
- URL: http://arxiv.org/abs/2510.15720v2
- Date: Tue, 21 Oct 2025 10:10:58 GMT
- Title: ProSh: Probabilistic Shielding for Model-free Reinforcement Learning
- Authors: Edwin Hamel-De le Court, Gaspard Ohlmann, Francesco Belardinelli,
- Abstract summary: We introduce Probabilistic Shielding via Risk Augmentation (ProSh), a model-free algorithm for safe reinforcement learning under cost constraints.<n>ProSh enforces safety by applying a shield to the agent's policy distribution using a learned cost critic.<n>We show that optimality is preserved when the environment is deterministic.
- Score: 11.125968799758434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safety is a major concern in reinforcement learning (RL): we aim at developing RL systems that not only perform optimally, but are also safe to deploy by providing formal guarantees about their safety. To this end, we introduce Probabilistic Shielding via Risk Augmentation (ProSh), a model-free algorithm for safe reinforcement learning under cost constraints. ProSh augments the Constrained MDP state space with a risk budget and enforces safety by applying a shield to the agent's policy distribution using a learned cost critic. The shield ensures that all sampled actions remain safe in expectation. We also show that optimality is preserved when the environment is deterministic. Since ProSh is model-free, safety during training depends on the knowledge we have acquired about the environment. We provide a tight upper-bound on the cost in expectation, depending only on the backup-critic accuracy, that is always satisfied during training. Under mild, practically achievable assumptions, ProSh guarantees safety even at training time, as shown in the experiments.
Related papers
- Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models [57.006252510102506]
Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but often lacks provable guarantees for safety-critical applications.<n>We introduce a novel recovery-based shielding framework that enables safe RL with a provable safety lower bound for unknown and non-linear continuous dynamical systems.
arXiv Detail & Related papers (2026-02-12T22:03:35Z) - Predictive Safety Shield for Dyna-Q Reinforcement Learning [0.0]
We propose a predictive safety shield for model-based reinforcement learning agents in discrete space.<n>Our safety shield updates the Q-function locally based on safe predictions, which originate from a safe simulation of the environment model.<n>Our experiments on gridworld environments demonstrate that even short prediction horizons can be sufficient to identify the optimal path.
arXiv Detail & Related papers (2025-11-26T15:59:55Z) - Rethinking Safety in LLM Fine-tuning: An Optimization Perspective [56.31306558218838]
We show that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts.<n>We propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance.<n>Our experiments on the Llama families across multiple datasets demonstrate that safety problems can largely be avoided without specialized interventions.
arXiv Detail & Related papers (2025-08-17T23:46:36Z) - Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards [23.15178050525514]
Safe Reinforcement Learning (Safe RL) aims to train an RL agent to maximize its performance in real-world environments while adhering to safety constraints.<n>We propose a novel safe RL approach called Safety Modulated Policy Optimization (SMPO), which enables safe policy function learning.
arXiv Detail & Related papers (2025-04-03T21:35:22Z) - Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time.<n>We present a new, scalable method, which enjoys strict formal guarantees for Safe RL.<n>We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z) - Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Leveraging Approximate Model-based Shielding for Probabilistic Safety
Guarantees in Continuous Environments [63.053364805943026]
We extend the approximate model-based shielding framework to the continuous setting.
In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms.
arXiv Detail & Related papers (2024-02-01T17:55:08Z) - Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees [14.107064796593225]
We introduce a safe reinforcement learning framework that incorporates robustness through the use of an optimal transport cost uncertainty set.
In experiments on continuous control tasks with safety constraints, our approach demonstrates robust performance while significantly improving safety at deployment time.
arXiv Detail & Related papers (2023-01-31T02:39:52Z) - ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems.
A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error.
While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z) - Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z) - Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and
Generalization Guarantees [7.6347172725540995]
Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world.
We propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution.
arXiv Detail & Related papers (2022-01-20T18:41:01Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.