Enhancing Safe Exploration Using Safety State Augmentation
- URL: http://arxiv.org/abs/2206.02675v1
- Date: Mon, 6 Jun 2022 15:23:07 GMT
- Title: Enhancing Safe Exploration Using Safety State Augmentation
- Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Jun Wang, Haitham Bou Ammar
- Abstract summary: We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
- Score: 71.00929878212382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safe exploration is a challenging and important problem in model-free
reinforcement learning (RL). Often the safety cost is sparse and unknown, which
unavoidably leads to constraint violations -- a phenomenon ideally to be
avoided in safety-critical applications. We tackle this problem by augmenting
the state-space with a safety state, which is nonnegative if and only if the
constraint is satisfied. The value of this state also serves as a distance
toward constraint violation, while its initial value indicates the available
safety budget. This idea allows us to derive policies for scheduling the safety
budget during training. We call our approach Simmer (Safe policy IMproveMEnt
for RL) to reflect the careful nature of these schedules. We apply this idea to
two safe RL problems: RL with constraints imposed on an average cost, and RL
with constraints imposed on a cost with probability one. Our experiments
suggest that simmering a safe algorithm can improve safety during training for
both settings. We further show that Simmer can stabilize training and improve
the performance of safe RL with average constraints.
Related papers
- Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.