Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery
- URL: http://arxiv.org/abs/2306.13944v1
- Date: Sat, 24 Jun 2023 12:02:50 GMT
- Title: Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery
- Authors: Xiao Zhang, Hai Zhang, Hongtu Zhou, Chang Huang, Di Zhang, Chen Ye,
Junqiao Zhao
- Abstract summary: Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks.
We propose a method to construct a boundary that discriminates safe and unsafe states.
Our approach has better task performance with less safety violations than state-of-the-art algorithms.
- Score: 13.333197887318168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety is one of the main challenges in applying reinforcement learning to
realistic environmental tasks. To ensure safety during and after training
process, existing methods tend to adopt overly conservative policy to avoid
unsafe situations. However, overly conservative policy severely hinders the
exploration, and makes the algorithms substantially less rewarding. In this
paper, we propose a method to construct a boundary that discriminates safe and
unsafe states. The boundary we construct is equivalent to distinguishing
dead-end states, indicating the maximum extent to which safe exploration is
guaranteed, and thus has minimum limitation on exploration. Similar to Recovery
Reinforcement Learning, we utilize a decoupled RL framework to learn two
policies, (1) a task policy that only considers improving the task performance,
and (2) a recovery policy that maximizes safety. The recovery policy and a
corresponding safety critic are pretrained on an offline dataset, in which the
safety critic evaluates upper bound of safety in each state as awareness of
environmental safety for the agent. During online training, a behavior
correction mechanism is adopted, ensuring the agent to interact with the
environment using safe actions only. Finally, experiments of continuous control
tasks demonstrate that our approach has better task performance with less
safety violations than state-of-the-art algorithms.
Related papers
- Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - State-Wise Safe Reinforcement Learning With Pixel Observations [12.338614299403305]
We propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions.
As a joint learning framework, our approach begins by constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations.
We then build and learn a latent barrier-like function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return.
arXiv Detail & Related papers (2023-11-03T20:32:30Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Provably Safe PAC-MDP Exploration Using Analogies [87.41775218021044]
Key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration and safety.
We propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, dynamics.
Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense.
arXiv Detail & Related papers (2020-07-07T15:50:50Z) - Verifiably Safe Exploration for End-to-End Reinforcement Learning [17.401496872603943]
This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs.
It is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints.
arXiv Detail & Related papers (2020-07-02T16:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.