Related papers: Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

URL: http://arxiv.org/abs/2405.03005v1
Date: Sun, 5 May 2024 17:27:22 GMT
Title: Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints
Authors: Siow Meng Low, Akshat Kumar,
Abstract summary: We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety. We derive an effective algorithm for optimizing a safe policy using the learned safety model. We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
Score: 15.904640266226023
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In safe Reinforcement Learning (RL), safety cost is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient fidelity of state representation, and safety cost may not be known. We therefore address a general setting where safety labels (e.g., safe or unsafe) are associated with state-action trajectories. Our key contributions are: first, we design a safety model that specifically performs credit assignment to assess contributions of partial state-action trajectories on safety. This safety model is trained using a labeled safety dataset. Second, using RL-as-inference strategy we derive an effective algorithm for optimizing a safe policy using the learned safety model. Finally, we devise a method to dynamically adapt the tradeoff coefficient between reward maximization and safety compliance. We rewrite the constrained optimization problem into its dual problem and derive a gradient-based method to dynamically adjust the tradeoff coefficient during training. Our empirical results demonstrate that this approach is highly scalable and able to satisfy sophisticated non-Markovian safety constraints.

Related papers

Shape it Up! Restoring LLM Safety during Finetuning [66.46166656543761]
Finetuning large language models (LLMs) enables user-specific customization but introduces critical safety risks.<n>We propose dynamic safety shaping (DSS), a framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content.<n>We present STAR-DSS, guided by STAR scores, that robustly mitigates finetuning risks and delivers substantial safety improvements across diverse threats, datasets, and model families.
arXiv Detail & Related papers (2025-05-22T18:05:16Z)
TraCeS: Trajectory Based Credit Assignment From Sparse Safety Feedback [15.904640266226023]
In safe reinforcement learning (RL), auxiliary safety costs are used to align the agent to safe decision making. In practice, safety constraints, including cost functions and budgets, are unknown or hard to specify. We address a general setting where the true safety definition is unknown, and has to be learned from sparsely labeled data.
arXiv Detail & Related papers (2025-04-17T01:11:08Z)
Safe Vision-Language Models via Unsafe Weights Manipulation [75.04426753720551]
We revise safety evaluation by introducing Safe-Ground, a new set of metrics that evaluate safety at different levels of granularity. We take a different direction and explore whether it is possible to make a model safer without training, introducing Unsafe Weights Manipulation (UWM) UWM uses a calibration set of safe and unsafe instances to compare activations between safe and unsafe content, identifying the most important parameters for processing the latter.
arXiv Detail & Related papers (2025-03-14T17:00:22Z)
Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
Vulnerability Mitigation for Safety-Aligned Language Models via Debiasing [12.986006070964772]
Safety alignment is an essential research topic for real-world AI applications. Our study first identified the difficulty of eliminating such vulnerabilities without sacrificing the model's helpfulness. Our method could enhance the model's helpfulness while maintaining safety, thus improving the trade-off-front.
arXiv Detail & Related papers (2025-02-04T09:31:54Z)
Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL) We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z)
Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery [13.333197887318168]
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks. We propose a method to construct a boundary that discriminates safe and unsafe states. Our approach has better task performance with less safety violations than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-06-24T12:02:50Z)
Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects [27.41101006357176]
In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects. operating based on an incomplete model can often produce unintended negative side effects (NSEs)
arXiv Detail & Related papers (2023-04-06T14:03:24Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL. We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation [3.5884936187733394]
We address the problem of safe reinforcement learning from pixel observations. We formalize the problem in a constrained, partially observable Markov decision process framework. We employ a novel safety critic using the latent actor-critic (SLAC) approach.
arXiv Detail & Related papers (2022-10-02T19:55:42Z)
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy. Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z)
Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning. We derive policies for scheduling the safety budget during training. We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z)
Fail-Safe Adversarial Generative Imitation Learning [9.594432031144716]
We propose a safety layer that enables a closed-form probability density/gradient of the safe generative continuous policy, end-to-end generative adversarial training, and worst-case safety guarantees. The safety layer maps all actions into a set of safe actions, and uses the change-of-variables formula plus additivity of measures for the density. In an experiment on real-world driver interaction data, we empirically demonstrate tractability, safety and imitation performance of our approach.
arXiv Detail & Related papers (2022-03-03T13:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.