Safety Assessment in Reinforcement Learning via Model Predictive Control
- URL: http://arxiv.org/abs/2510.20955v1
- Date: Thu, 23 Oct 2025 19:31:18 GMT
- Title: Safety Assessment in Reinforcement Learning via Model Predictive Control
- Authors: Jeff Pflueger, Michael Everett,
- Abstract summary: We propose to leverage reversibility as a method for preventing safety issues throughout the training process.<n>Our method uses model-predictive path integral control to check the safety of an action proposed by a learned policy throughout training.
- Score: 3.244287913152012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-free reinforcement learning approaches are promising for control but typically lack formal safety guarantees. Existing methods to shield or otherwise provide these guarantees often rely on detailed knowledge of the safety specifications. Instead, this work's insight is that many difficult-to-specify safety issues are best characterized by invariance. Accordingly, we propose to leverage reversibility as a method for preventing these safety issues throughout the training process. Our method uses model-predictive path integral control to check the safety of an action proposed by a learned policy throughout training. A key advantage of this approach is that it only requires the ability to query the black-box dynamics, not explicit knowledge of the dynamics or safety constraints. Experimental results demonstrate that the proposed algorithm successfully aborts before all unsafe actions, while still achieving comparable training progress to a baseline PPO approach that is allowed to violate safety.
Related papers
- Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models [57.006252510102506]
Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but often lacks provable guarantees for safety-critical applications.<n>We introduce a novel recovery-based shielding framework that enables safe RL with a provable safety lower bound for unknown and non-linear continuous dynamical systems.
arXiv Detail & Related papers (2026-02-12T22:03:35Z) - Safely Learning Controlled Stochastic Dynamics [61.82896036131116]
We introduce a method that ensures safe exploration and efficient estimation of system dynamics.<n>After training, the learned model enables predictions of the system's dynamics and permits safety verification of any given control.<n>We provide theoretical guarantees for safety and derive adaptive learning rates that improve with increasing Sobolev regularity of the true dynamics.
arXiv Detail & Related papers (2025-06-03T11:17:07Z) - Safe Vision-Language Models via Unsafe Weights Manipulation [75.04426753720551]
We revise safety evaluation by introducing Safe-Ground, a new set of metrics that evaluate safety at different levels of granularity.<n>We take a different direction and explore whether it is possible to make a model safer without training, introducing Unsafe Weights Manipulation (UWM)<n>UWM uses a calibration set of safe and unsafe instances to compare activations between safe and unsafe content, identifying the most important parameters for processing the latter.
arXiv Detail & Related papers (2025-03-14T17:00:22Z) - Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time.<n>We present a new, scalable method, which enjoys strict formal guarantees for Safe RL.<n>We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z) - Vulnerability Mitigation for Safety-Aligned Language Models via Debiasing [12.986006070964772]
Safety alignment is an essential research topic for real-world AI applications.<n>Our study first identified the difficulty of eliminating such vulnerabilities without sacrificing the model's helpfulness.<n>Our method could enhance the model's helpfulness while maintaining safety, thus improving the trade-off-front.
arXiv Detail & Related papers (2025-02-04T09:31:54Z) - Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Barrier Certified Safety Learning Control: When Sum-of-Square
Programming Meets Reinforcement Learning [0.0]
This work adopts control barrier functions over reinforcement learning, and proposes a compensated algorithm to completely maintain safety.
Compared to quadratic programming based reinforcement learning methods, our sum-of-squares programming based reinforcement learning has shown its superiority.
arXiv Detail & Related papers (2022-06-16T04:38:50Z) - Safe Reinforcement Learning with Chance-constrained Model Predictive
Control [10.992151305603267]
Real-world reinforcement learning (RL) problems often demand that agents behave safely by obeying a set of designed constraints.
We address the challenge of safe RL by coupling a safety guide based on model predictive control (MPC) with a modified policy gradient framework.
We show theoretically that this penalty allows for the safety guide to be removed after training and illustrate our method using experiments with a simulator quadrotor.
arXiv Detail & Related papers (2021-12-27T23:47:45Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.