Robust Safe Reinforcement Learning under Adversarial Disturbances
- URL: http://arxiv.org/abs/2310.07207v1
- Date: Wed, 11 Oct 2023 05:34:46 GMT
- Title: Robust Safe Reinforcement Learning under Adversarial Disturbances
- Authors: Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, Yunan Wang
- Abstract summary: Safety is a primary concern when applying reinforcement learning to real-world control tasks.
Existing safe reinforcement learning algorithms rarely account for external disturbances.
This paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances.
- Score: 12.145611442959602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safety is a primary concern when applying reinforcement learning to
real-world control tasks, especially in the presence of external disturbances.
However, existing safe reinforcement learning algorithms rarely account for
external disturbances, limiting their applicability and robustness in practice.
To address this challenge, this paper proposes a robust safe reinforcement
learning framework that tackles worst-case disturbances. First, this paper
presents a policy iteration scheme to solve for the robust invariant set, i.e.,
a subset of the safe set, where persistent safety is only possible for states
within. The key idea is to establish a two-player zero-sum game by leveraging
the safety value function in Hamilton-Jacobi reachability analysis, in which
the protagonist (i.e., control inputs) aims to maintain safety and the
adversary (i.e., external disturbances) tries to break down safety. This paper
proves that the proposed policy iteration algorithm converges monotonically to
the maximal robust invariant set. Second, this paper integrates the proposed
policy iteration scheme into a constrained reinforcement learning algorithm
that simultaneously synthesizes the robust invariant set and uses it for
constrained policy optimization. This algorithm tackles both optimality and
safety, i.e., learning a policy that attains high rewards while maintaining
safety under worst-case disturbances. Experiments on classic control tasks show
that the proposed method achieves zero constraint violation with learned
worst-case adversarial disturbances, while other baseline algorithms violate
the safety constraints substantially. Our proposed method also attains
comparable performance as the baselines even in the absence of the adversary.
Related papers
- Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Safe Reinforcement Learning with Dual Robustness [10.455148541147796]
Reinforcement learning (RL) agents are vulnerable to adversarial disturbances.
We propose a systematic framework to unify safe RL and robust RL.
We also design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC)
arXiv Detail & Related papers (2023-09-13T09:34:21Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Policy Improvement in Constrained Markov Decision Processes [10.518340300810504]
We present a solution to the synthesis problem by solving its two main challenges: reward-shaping from a set of formal requirements and safe policy update.
For the former, we propose an automatic reward-shaping procedure, defining a scalar reward signal compliant with the task specification.
For the latter, we introduce an algorithm ensuring that the policy is improved in a safe fashion with high-confidence guarantees.
arXiv Detail & Related papers (2022-10-20T13:29:32Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control.
We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z) - Verifiably Safe Exploration for End-to-End Reinforcement Learning [17.401496872603943]
This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs.
It is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints.
arXiv Detail & Related papers (2020-07-02T16:12:20Z) - Safe Reinforcement Learning of Control-Affine Systems with Vertex
Networks [14.461847761198037]
This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints.
Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy.
To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies.
arXiv Detail & Related papers (2020-03-20T20:32:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.