Safe Reinforcement Learning with Dual Robustness
- URL: http://arxiv.org/abs/2309.06835v1
- Date: Wed, 13 Sep 2023 09:34:21 GMT
- Title: Safe Reinforcement Learning with Dual Robustness
- Authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li
- Abstract summary: Reinforcement learning (RL) agents are vulnerable to adversarial disturbances.
We propose a systematic framework to unify safe RL and robust RL.
We also design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC)
- Score: 10.455148541147796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) agents are vulnerable to adversarial
disturbances, which can deteriorate task performance or compromise safety
specifications. Existing methods either address safety requirements under the
assumption of no adversary (e.g., safe RL) or only focus on robustness against
performance adversaries (e.g., robust RL). Learning one policy that is both
safe and robust remains a challenging open problem. The difficulty is how to
tackle two intertwined aspects in the worst cases: feasibility and optimality.
Optimality is only valid inside a feasible region, while identification of
maximal feasible region must rely on learning the optimal policy. To address
this issue, we propose a systematic framework to unify safe RL and robust RL,
including problem formulation, iteration scheme, convergence analysis and
practical algorithm design. This unification is built upon constrained
two-player zero-sum Markov games. A dual policy iteration scheme is proposed,
which simultaneously optimizes a task policy and a safety policy. The
convergence of this iteration scheme is proved. Furthermore, we design a deep
RL algorithm for practical implementation, called dually robust actor-critic
(DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC
achieves high performance and persistent safety under all scenarios (no
adversary, safety adversary, performance adversary), outperforming all
baselines significantly.
Related papers
- Robust Safe Reinforcement Learning under Adversarial Disturbances [12.145611442959602]
Safety is a primary concern when applying reinforcement learning to real-world control tasks.
Existing safe reinforcement learning algorithms rarely account for external disturbances.
This paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances.
arXiv Detail & Related papers (2023-10-11T05:34:46Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - On the Robustness of Safe Reinforcement Learning under Observational
Perturbations [27.88525130218356]
We show that baseline adversarial attack techniques for standard RL tasks are not always effective for safe RL.
One interesting and counter-intuitive finding is that the maximum reward attack is strong, as it can both induce unsafe behaviors and make the attack stealthy by maintaining the reward.
This work sheds light on the inherited connection between observational robustness and safety in RL and provides a pioneer work for future safe RL studies.
arXiv Detail & Related papers (2022-05-29T15:25:03Z) - Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments.
We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z) - Decision-Making under On-Ramp merge Scenarios by Distributional Soft
Actor-Critic Algorithm [10.258474373022075]
We propose an RL-based end-to-end decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (SDSAC)
The results show that the SDSAC has the best safety performance compared to baseline algorithms and efficient driving simultaneously.
arXiv Detail & Related papers (2021-03-08T03:57:32Z) - Safe Distributional Reinforcement Learning [19.607668635077495]
Safety in reinforcement learning (RL) is a key property in both training and execution in many domains such as autonomous driving or finance.
We formalize it with a constrained RL formulation in the distributional RL setting.
We empirically validate our propositions on artificial and real domains against appropriate state-of-the-art safe RL algorithms.
arXiv Detail & Related papers (2021-02-26T13:03:27Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.