Safe Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2102.13446v1
- Date: Fri, 26 Feb 2021 13:03:27 GMT
- Title: Safe Distributional Reinforcement Learning
- Authors: Jianyi Zhang, Paul Weng
- Abstract summary: Safety in reinforcement learning (RL) is a key property in both training and execution in many domains such as autonomous driving or finance.
We formalize it with a constrained RL formulation in the distributional RL setting.
We empirically validate our propositions on artificial and real domains against appropriate state-of-the-art safe RL algorithms.
- Score: 19.607668635077495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safety in reinforcement learning (RL) is a key property in both training and
execution in many domains such as autonomous driving or finance. In this paper,
we formalize it with a constrained RL formulation in the distributional RL
setting. Our general model accepts various definitions of safety(e.g., bounds
on expected performance, CVaR, variance, or probability of reaching bad
states). To ensure safety during learning, we extend a safe policy optimization
method to solve our problem. The distributional RL perspective leads to a more
efficient algorithm while additionally catering for natural safe constraints.
We empirically validate our propositions on artificial and real domains against
appropriate state-of-the-art safe RL algorithms.
Related papers
- Safety Optimized Reinforcement Learning via Multi-Objective Policy
Optimization [3.425378723819911]
Safe reinforcement learning (Safe RL) refers to a class of techniques that aim to prevent RL algorithms from violating constraints.
In this paper, a novel model-free Safe RL algorithm, formulated based on the multi-objective policy optimization framework is introduced.
arXiv Detail & Related papers (2024-02-23T08:58:38Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - SafeRL-Kit: Evaluating Efficient Reinforcement Learning Methods for Safe
Autonomous Driving [12.925039760573092]
We release SafeRL-Kit to benchmark safe RL methods for autonomous driving tasks.
SafeRL-Kit contains several latest algorithms specific to zero-constraint-violation tasks, including Safety Layer, Recovery RL, off-policy Lagrangian method, and Feasible Actor-Critic.
We conduct a comparative evaluation of the above algorithms in SafeRL-Kit and shed light on their efficacy for safe autonomous driving.
arXiv Detail & Related papers (2022-06-17T03:23:51Z) - Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and
Benchmarking [12.719948223824483]
reinforcement learning (RL) algorithms are crucial to unlock their potential for many real-world tasks.
However, vanilla RL and most safe RL approaches do not guarantee safety.
We introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods.
We provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
arXiv Detail & Related papers (2022-05-13T16:34:36Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.