Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents
- URL: http://arxiv.org/abs/2410.11671v2
- Date: Mon, 25 Nov 2024 23:44:01 GMT
- Title: Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents
- Authors: Federico Pizarro Bejarano, Lukas Brunke, Angela P. Schoellig,
- Abstract summary: Reinforcement learning (RL) controllers are flexible and performant but rarely guarantee safety.
Safety filters impart hard safety guarantees to RL controllers while maintaining flexibility.
We analyze several modifications to incorporating the safety filter in training RL controllers rather than solely applying it during evaluation.
- Score: 7.55113002732746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) controllers are flexible and performant but rarely guarantee safety. Safety filters impart hard safety guarantees to RL controllers while maintaining flexibility. However, safety filters can cause undesired behaviours due to the separation between the controller and the safety filter, often degrading performance and robustness. In this paper, we analyze several modifications to incorporating the safety filter in training RL controllers rather than solely applying it during evaluation. The modifications allow the RL controller to learn to account for the safety filter, improving performance. This paper presents a comprehensive analysis of training RL with safety filters, featuring simulated and real-world experiments with a Crazyflie 2.0 drone. We examine how various training modifications and hyperparameters impact performance, sample efficiency, safety, and chattering. Our findings serve as a guide for practitioners and researchers focused on safety filters and safe RL.
Related papers
- Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards [23.15178050525514]
Safe Reinforcement Learning (Safe RL) aims to train an RL agent to maximize its performance in real-world environments while adhering to safety constraints.
We propose a novel safe RL approach called Safety Modulated Policy Optimization (SMPO), which enables safe policy function learning.
arXiv Detail & Related papers (2025-04-03T21:35:22Z) - ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies [5.923818043882103]
Pruning neural networks (NNs) can streamline them but risks removing vital parameters from safe reinforcement learning (RL) policies.
We introduce an interpretable RL method called VERINTER, which combines NN pruning with model checking to ensure interpretable RL safety.
arXiv Detail & Related papers (2024-09-16T12:13:41Z) - Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical
Systems [15.863561935347692]
We develop provably safe and convergent reinforcement learning algorithms for control of nonlinear dynamical systems.
Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints.
We develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees.
arXiv Detail & Related papers (2024-03-06T19:39:20Z) - Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters [0.0]
Reinforcement learning is increasingly used to adapt to complex scenarios, but standard frameworks ensuring safety and stability are lacking.
Predictive Safety Filters (PSF) offer a promising solution, ensuring constraint satisfaction in learning-based control without explicit constraint handling.
We apply this approach to marine navigation, combining RL with PSF on a simulated Cybership II model.
Results demonstrate the PSF's effectiveness in maintaining safety without hindering the RL agent's learning rate and performance, evaluated against a standard RL agent without PSF.
arXiv Detail & Related papers (2023-12-04T12:37:54Z) - Safe Deep Policy Adaptation [7.2747306035142225]
Policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges.
We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning.
We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations.
arXiv Detail & Related papers (2023-10-08T00:32:59Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Safe Model-Based Reinforcement Learning Using Robust Control Barrier
Functions [43.713259595810854]
An increasingly common approach to address safety involves the addition of a safety layer that projects the RL actions onto a safe set of actions.
In this paper, we frame safety as a differentiable robust-control-barrier-function layer in a model-based RL framework.
arXiv Detail & Related papers (2021-10-11T17:00:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.