Safe Deep Reinforcement Learning by Verifying Task-Level Properties
- URL: http://arxiv.org/abs/2302.10030v1
- Date: Mon, 20 Feb 2023 15:24:06 GMT
- Title: Safe Deep Reinforcement Learning by Verifying Task-Level Properties
- Authors: Enrico Marchesini, Luca Marzari, Alessandro Farinelli, Christopher
Amato
- Abstract summary: Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL)
The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space.
In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
- Score: 84.64203221849648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cost functions are commonly employed in Safe Deep Reinforcement Learning
(DRL). However, the cost is typically encoded as an indicator function due to
the difficulty of quantifying the risk of policy decisions in the state space.
Such an encoding requires the agent to visit numerous unsafe states to learn a
cost-value function to drive the learning process toward safety. Hence,
increasing the number of unsafe interactions and decreasing sample efficiency.
In this paper, we investigate an alternative approach that uses domain
knowledge to quantify the risk in the proximity of such states by defining a
violation metric. This metric is computed by verifying task-level properties,
shaped as input-output conditions, and it is used as a penalty to bias the
policy away from unsafe states without learning an additional value function.
We investigate the benefits of using the violation metric in standard Safe DRL
benchmarks and robotic mapless navigation tasks. The navigation experiments
bridge the gap between Safe DRL and robotics, introducing a framework that
allows rapid testing on real robots. Our experiments show that policies trained
with the violation penalty achieve higher performance over Safe DRL baselines
and significantly reduce the number of visited unsafe states.
Related papers
- Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning [0.0]
We propose a safe reinforcement learning (RL) approach that utilizes an anomalous state sequence to enhance RL safety.
In experiments on multiple safety-critical environments including self-driving cars, our solution approach successfully learns safer policies.
arXiv Detail & Related papers (2024-07-29T10:30:07Z) - Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - ROSARL: Reward-Only Safe Reinforcement Learning [11.998722332188]
An important problem in reinforcement learning is designing agents that learn to solve tasks safely in an environment.
A common solution is for a human expert to define either a penalty in the reward function or a cost to be minimised when reaching unsafe states.
This is non-trivial, since too small a penalty may lead to agents that reach unsafe states, while too large a penalty increases the time to convergence.
arXiv Detail & Related papers (2023-05-31T08:33:23Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Online Safety Property Collection and Refinement for Safe Deep
Reinforcement Learning in Mapless Navigation [79.89605349842569]
We introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time.
CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties.
We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.
arXiv Detail & Related papers (2023-02-13T21:19:36Z) - Safe Reinforcement Learning using Data-Driven Predictive Control [0.5459797813771499]
We propose a data-driven safety layer that acts as a filter for unsafe actions.
The safety layer penalizes the RL agent if the proposed action is unsafe and replaces it with the closest safe one.
In a simulation, we show that our method outperforms state-of-the-art safe RL methods on the robotics navigation problem.
arXiv Detail & Related papers (2022-11-20T17:10:40Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - Evaluating the Safety of Deep Reinforcement Learning Models using
Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis.
Our method obtains comparable results over standard benchmarks with respect to formal verifiers.
Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.