Runtime Safety through Adaptive Shielding: From Hidden Parameter Inference to Provable Guarantees
- URL: http://arxiv.org/abs/2506.11033v1
- Date: Tue, 20 May 2025 23:45:45 GMT
- Title: Runtime Safety through Adaptive Shielding: From Hidden Parameter Inference to Provable Guarantees
- Authors: Minjae Kwon, Tyler Ingebrand, Ufuk Topcu, Lu Feng,
- Abstract summary: Variations in hidden parameters, such as a robot's mass distribution or friction, pose safety risks during execution.<n>We develop a runtime shielding mechanism for reinforcement learning.<n>We prove that the proposed mechanism satisfies probabilistic safety guarantees and yields optimal policies.
- Score: 17.670635109868854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variations in hidden parameters, such as a robot's mass distribution or friction, pose safety risks during execution. We develop a runtime shielding mechanism for reinforcement learning, building on the formalism of constrained hidden-parameter Markov decision processes. Function encoders enable real-time inference of hidden parameters from observations, allowing the shield and the underlying policy to adapt online. The shield constrains the action space by forecasting future safety risks (such as obstacle proximity) and accounts for uncertainty via conformal prediction. We prove that the proposed mechanism satisfies probabilistic safety guarantees and yields optimal policies among the set of safety-compliant policies. Experiments across diverse environments with varying hidden parameters show that our method significantly reduces safety violations and achieves strong out-of-distribution generalization, while incurring minimal runtime overhead.
Related papers
- Distributed Risk-Sensitive Safety Filters for Uncertain Discrete-Time Systems [39.53920064972246]
We propose a novel risk-sensitive safety filter for discrete-time multi-agent systems with uncertain dynamics.<n>Our approach relies on centralized risk-sensitive safety conditions based on exponential risk operators to ensure robustness against model uncertainties.
arXiv Detail & Related papers (2025-06-09T01:48:25Z) - Shape it Up! Restoring LLM Safety during Finetuning [66.46166656543761]
Finetuning large language models (LLMs) enables user-specific customization but introduces critical safety risks.<n>We propose dynamic safety shaping (DSS), a framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content.<n>We present STAR-DSS, guided by STAR scores, that robustly mitigates finetuning risks and delivers substantial safety improvements across diverse threats, datasets, and model families.
arXiv Detail & Related papers (2025-05-22T18:05:16Z) - Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time.<n>We present a new, scalable method, which enjoys strict formal guarantees for Safe RL.<n>We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z) - On Almost Surely Safe Alignment of Large Language Models at Inference-Time [20.5164976103514]
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely.<n>We augment a safety state that tracks the evolution of safety constraints and dynamically penalizes unsafe generations.<n>We demonstrate formal safety guarantees w.r.t. the given cost model upon solving the MDP in the latent space with sufficiently large penalties.
arXiv Detail & Related papers (2025-02-03T09:59:32Z) - Realizable Continuous-Space Shields for Safe Reinforcement Learning [13.728961635717134]
We present the first shielding approach specifically designed to ensure the satisfaction of safety requirements in continuous state and action spaces.<n>Our method builds upon realizability, an essential property that confirms the shield will always be able to generate a safe action for any state in the environment.
arXiv Detail & Related papers (2024-10-02T21:08:11Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - SafeDiffuser: Safe Planning with Diffusion Probabilistic Models [97.80042457099718]
Diffusion model-based approaches have shown promise in data-driven planning, but there are no safety guarantees.
We propose a new method, called SafeDiffuser, to ensure diffusion probabilistic models satisfy specifications.
We test our method on a series of safe planning tasks, including maze path generation, legged robot locomotion, and 3D space manipulation.
arXiv Detail & Related papers (2023-05-31T19:38:12Z) - ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems.
A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error.
While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z) - Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity.
As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner.
On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z) - ProBF: Learning Probabilistic Safety Certificates with Barrier Functions [31.203344483485843]
The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics.
In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors.
We show the efficacy of this method through experiments on Segway and Quadrotor simulations.
arXiv Detail & Related papers (2021-12-22T20:18:18Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.