Related papers: Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies

Related papers

Reward-Safety Balance in Offline Safe RL via Diffusion Regularization [16.5825143820431]
Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL) DRCORL first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference.
arXiv Detail & Related papers (2025-02-18T00:00:03Z)
Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies [5.923818043882103]
Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors and are challenging to interpret. We combine RL policy model checking and co-activation graph analysis. This combination lets us interpret the RL policy's inner workings for safe decision-making.
arXiv Detail & Related papers (2025-01-06T17:07:44Z)
xSRL: Safety-Aware Explainable Reinforcement Learning -- Safety as a Product of Explainability [8.016667413960995]
We propose xSRL, a framework that integrates both local and global explanations to provide a comprehensive understanding of RL agents' behavior. xSRL also enables developers to identify policy vulnerabilities through adversarial attacks, offering tools to debug and patch agents without retraining. Our experiments and user studies demonstrate xSRL's effectiveness in increasing safety in RL systems, making them more reliable and trustworthy for real-world deployment.
arXiv Detail & Related papers (2024-12-26T18:19:04Z)
Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents [7.55113002732746]
Reinforcement learning (RL) controllers are flexible and performant but rarely guarantee safety. Safety filters impart hard safety guarantees to RL controllers while maintaining flexibility. We analyze several modifications to incorporating the safety filter in training RL controllers rather than solely applying it during evaluation.
arXiv Detail & Related papers (2024-10-15T15:01:57Z)
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems [15.863561935347692]
We develop provably safe and convergent reinforcement learning algorithms for control of nonlinear dynamical systems. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints. We develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees.
arXiv Detail & Related papers (2024-03-06T19:39:20Z)
Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification. We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations. Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z)
Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation [79.89605349842569]
We introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time. CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties. We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.
arXiv Detail & Related papers (2023-02-13T21:19:36Z)
Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z)
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking [12.719948223824483]
reinforcement learning (RL) algorithms are crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. We introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
arXiv Detail & Related papers (2022-05-13T16:34:36Z)
SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints. Through principled training on an offline dataset, SAFER learns to extract safe primitive skills. In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z)
Safe Model-Based Reinforcement Learning Using Robust Control Barrier Functions [43.713259595810854]
An increasingly common approach to address safety involves the addition of a safety layer that projects the RL actions onto a safe set of actions. In this paper, we frame safety as a differentiable robust-control-barrier-function layer in a model-based RL framework.
arXiv Detail & Related papers (2021-10-11T17:00:45Z)
Assured Learning-enabled Autonomy: A Metacognitive Reinforcement Learning Framework [4.427447378048202]
Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances. An assured autonomous control framework is presented in this paper by empowering RL algorithms with metacognitive learning capabilities.
arXiv Detail & Related papers (2021-03-23T14:01:35Z)
Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.