TRC: Trust Region Conditional Value at Risk for Safe Reinforcement
Learning
- URL: http://arxiv.org/abs/2312.00344v1
- Date: Fri, 1 Dec 2023 04:40:47 GMT
- Title: TRC: Trust Region Conditional Value at Risk for Safe Reinforcement
Learning
- Authors: Dohyeong Kim and Songhwai Oh
- Abstract summary: We propose a trust region-based safe RL method with CVaR constraints, called TRC.
We first derive the upper bound on CVaR and then approximate the upper bound in a differentiable form in a trust region.
Compared to other safe RL methods, the performance is improved by 1.93 times while the constraints are satisfied in all experiments.
- Score: 16.176812250762666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As safety is of paramount importance in robotics, reinforcement learning that
reflects safety, called safe RL, has been studied extensively. In safe RL, we
aim to find a policy which maximizes the desired return while satisfying the
defined safety constraints. There are various types of constraints, among which
constraints on conditional value at risk (CVaR) effectively lower the
probability of failures caused by high costs since CVaR is a conditional
expectation obtained above a certain percentile. In this paper, we propose a
trust region-based safe RL method with CVaR constraints, called TRC. We first
derive the upper bound on CVaR and then approximate the upper bound in a
differentiable form in a trust region. Using this approximation, a subproblem
to get policy gradients is formulated, and policies are trained by iteratively
solving the subproblem. TRC is evaluated through safe navigation tasks in
simulations with various robots and a sim-to-real environment with a Jackal
robot from Clearpath. Compared to other safe RL methods, the performance is
improved by 1.93 times while the constraints are satisfied in all experiments.
Related papers
- Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Efficient Off-Policy Safe Reinforcement Learning Using Trust Region
Conditional Value at Risk [16.176812250762666]
An on-policy safe RL method, called TRC, deals with a CVaR-constrained RL problem using a trust region method.
To achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods are required to be sample efficient.
We propose novel surrogate functions, in which the effect of the distributional shift can be reduced, and introduce an adaptive trust-region constraint to ensure a policy not to deviate far from replay buffers.
arXiv Detail & Related papers (2023-12-01T04:29:19Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL)
The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space.
In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - Minimizing Safety Interference for Safe and Comfortable Automated
Driving with Distributional Reinforcement Learning [3.923354711049903]
We propose a distributional reinforcement learning framework to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility.
We show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.
arXiv Detail & Related papers (2021-07-15T13:36:55Z) - Safe Distributional Reinforcement Learning [19.607668635077495]
Safety in reinforcement learning (RL) is a key property in both training and execution in many domains such as autonomous driving or finance.
We formalize it with a constrained RL formulation in the distributional RL setting.
We empirically validate our propositions on artificial and real domains against appropriate state-of-the-art safe RL algorithms.
arXiv Detail & Related papers (2021-02-26T13:03:27Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.