Learning Predictive Safety Filter via Decomposition of Robust Invariant
Set
- URL: http://arxiv.org/abs/2311.06769v1
- Date: Sun, 12 Nov 2023 08:11:28 GMT
- Title: Learning Predictive Safety Filter via Decomposition of Robust Invariant
Set
- Authors: Zeyang Li, Chuxiong Hu, Weiye Zhao, Changliu Liu
- Abstract summary: This paper presents advantages of both RMPC and RL RL to synthesize safety filters for nonlinear systems.
We propose a policy approach for robust reach problems and establish its complexity.
- Score: 6.94348936509225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring safety of nonlinear systems under model uncertainty and external
disturbances is crucial, especially for real-world control tasks. Predictive
methods such as robust model predictive control (RMPC) require solving
nonconvex optimization problems online, which leads to high computational
burden and poor scalability. Reinforcement learning (RL) works well with
complex systems, but pays the price of losing rigorous safety guarantee. This
paper presents a theoretical framework that bridges the advantages of both RMPC
and RL to synthesize safety filters for nonlinear systems with state- and
action-dependent uncertainty. We decompose the robust invariant set (RIS) into
two parts: a target set that aligns with terminal region design of RMPC, and a
reach-avoid set that accounts for the rest of RIS. We propose a policy
iteration approach for robust reach-avoid problems and establish its monotone
convergence. This method sets the stage for an adversarial actor-critic deep RL
algorithm, which simultaneously synthesizes a reach-avoid policy network, a
disturbance policy network, and a reach-avoid value network. The learned
reach-avoid policy network is utilized to generate nominal trajectories for
online verification, which filters potentially unsafe actions that may drive
the system into unsafe regions when worst-case disturbances are applied. We
formulate a second-order cone programming (SOCP) approach for online
verification using system level synthesis, which optimizes for the worst-case
reach-avoid value of any possible trajectories. The proposed safety filter
requires much lower computational complexity than RMPC and still enjoys
persistent robust safety guarantee. The effectiveness of our method is
illustrated through a numerical example.
Related papers
- Neural Port-Hamiltonian Models for Nonlinear Distributed Control: An Unconstrained Parametrization Approach [0.0]
Neural Networks (NNs) can be leveraged to parametrize control policies that yield good performance.
NNs' sensitivity to small input changes poses a risk of destabilizing the closed-loop system.
To address these problems, we leverage the framework of port-Hamiltonian systems to design continuous-time distributed control policies.
The effectiveness of the proposed distributed controllers is demonstrated through consensus control of non-holonomic mobile robots.
arXiv Detail & Related papers (2024-11-15T10:44:29Z) - Augmented Lagrangian-Based Safe Reinforcement Learning Approach for Distribution System Volt/VAR Control [1.1059341532498634]
This paper formulates the Volt- VAR control problem as a constrained Markov decision process (CMDP)
A novel safe off-policy reinforcement learning (RL) approach is proposed in this paper to solve the CMDP.
A two-stage strategy is adopted for offline training and online execution, so the accurate distribution system model is no longer needed.
arXiv Detail & Related papers (2024-10-19T19:45:09Z) - Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning [7.349727826230864]
We present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents.
The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function.
We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95% pm 9%$ cumulative reward.
arXiv Detail & Related papers (2024-05-04T20:59:06Z) - ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints [34.9739641898452]
This work introduces the ConstrainedZero policy algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy.
Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.
arXiv Detail & Related papers (2024-05-01T17:17:22Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - CaRT: Certified Safety and Robust Tracking in Learning-based Motion
Planning for Multi-Agent Systems [7.77024796789203]
CaRT is a new hierarchical, distributed architecture to guarantee the safety and robustness of a learning-based motion planning policy.
We show that CaRT guarantees safety and the exponentialness of the trajectory tracking error, even under the presence of deterministic and bounded disturbance.
We demonstrate the effectiveness of CaRT in several examples of nonlinear motion planning and control problems, including optimal, multi-spacecraft reconfiguration.
arXiv Detail & Related papers (2023-07-13T21:51:29Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control.
We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.