Safe Reinforcement Learning Using Advantage-Based Intervention
- URL: http://arxiv.org/abs/2106.09110v1
- Date: Wed, 16 Jun 2021 20:28:56 GMT
- Title: Safe Reinforcement Learning Using Advantage-Based Intervention
- Authors: Nolan Wagener, Byron Boots, Ching-An Cheng
- Abstract summary: Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
- Score: 45.79740561754542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many sequential decision problems involve finding a policy that maximizes
total reward while obeying safety constraints. Although much recent research
has focused on the development of safe reinforcement learning (RL) algorithms
that produce a safe policy after training, ensuring safety during training as
well remains an open problem. A fundamental challenge is performing exploration
while still satisfying constraints in an unknown Markov decision process (MDP).
In this work, we address this problem for the chance-constrained setting. We
propose a new algorithm, SAILR, that uses an intervention mechanism based on
advantage functions to keep the agent safe throughout training and optimizes
the agent's policy using off-the-shelf RL algorithms designed for unconstrained
MDPs. Our method comes with strong guarantees on safety during both training
and deployment (i.e., after training and without the intervention mechanism)
and policy performance compared to the optimal safety-constrained policy. In
our experiments, we show that SAILR violates constraints far less during
training than standard safe RL and constrained MDP approaches and converges to
a well-performing policy that can be deployed safely without intervention. Our
code is available at https://github.com/nolanwagener/safe_rl.
Related papers
- Towards Safe Load Balancing based on Control Barrier Functions and Deep
Reinforcement Learning [0.691367883100748]
We propose a safe learning-based load balancing algorithm for Software Defined-Wide Area Network (SD-WAN)
It is empowered by Deep Reinforcement Learning (DRL) combined with a Control Barrier Function (CBF)
We show that our approach delivers near-optimal Quality-of-Service (QoS) in terms of end-to-end delay while respecting safety requirements related to link capacity constraints.
arXiv Detail & Related papers (2024-01-10T19:43:12Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Improving Safety in Deep Reinforcement Learning using Unsupervised
Action Planning [4.2955354157580325]
One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases.
We propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms.
Our results show that the proposed safety RL algorithm can achieve higher rewards compared with multiple baselines in both discrete and continuous control problems.
arXiv Detail & Related papers (2021-09-29T10:26:29Z) - Minimizing Safety Interference for Safe and Comfortable Automated
Driving with Distributional Reinforcement Learning [3.923354711049903]
We propose a distributional reinforcement learning framework to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility.
We show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.
arXiv Detail & Related papers (2021-07-15T13:36:55Z) - Safe Distributional Reinforcement Learning [19.607668635077495]
Safety in reinforcement learning (RL) is a key property in both training and execution in many domains such as autonomous driving or finance.
We formalize it with a constrained RL formulation in the distributional RL setting.
We empirically validate our propositions on artificial and real domains against appropriate state-of-the-art safe RL algorithms.
arXiv Detail & Related papers (2021-02-26T13:03:27Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.