Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning
- URL: http://arxiv.org/abs/2112.12288v1
- Date: Thu, 23 Dec 2021 00:44:38 GMT
- Title: Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning
- Authors: Kai-Chieh Hsu, Vicen\c{c} Rubies-Royo, Claire J. Tomlin, Jaime F.
Fisac
- Abstract summary: Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
- Score: 24.56889192688925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reach-avoid optimal control problems, in which the system must reach certain
goal conditions while staying clear of unacceptable failure modes, are central
to safety and liveness assurance for autonomous robotic systems, but their
exact solutions are intractable for complex dynamics and environments. Recent
successes in reinforcement learning methods to approximately solve optimal
control problems with performance objectives make their application to
certification problems attractive; however, the Lagrange-type objective used in
reinforcement learning is not suitable to encode temporal logic requirements.
Recent work has shown promise in extending the reinforcement learning machinery
to safety-type problems, whose objective is not a sum, but a minimum (or
maximum) over time. In this work, we generalize the reinforcement learning
formulation to handle all optimal control problems in the reach-avoid category.
We derive a time-discounted reach-avoid Bellman backup with contraction mapping
properties and prove that the resulting reach-avoid Q-learning algorithm
converges under analogous conditions to the traditional Lagrange-type problem,
yielding an arbitrarily tight conservative approximation to the reach-avoid
set. We further demonstrate the use of this formulation with deep reinforcement
learning methods, retaining zero-violation guarantees by treating the
approximate solutions as untrusted oracles in a model-predictive supervisory
control framework. We evaluate our proposed framework on a range of nonlinear
systems, validating the results against analytic and numerical solutions, and
through Monte Carlo simulation in previously intractable problems. Our results
open the door to a range of learning-based methods for safe-and-live autonomous
behavior, with applications across robotics and automation. See
https://github.com/SafeRoboticsLab/safety_rl for code and supplementary
material.
Related papers
- Learning to Boost the Performance of Stable Nonlinear Systems [0.0]
We tackle the performance-boosting problem with closed-loop stability guarantees.
Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems.
arXiv Detail & Related papers (2024-05-01T21:11:29Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Adaptive Robust Model Predictive Control via Uncertainty Cancellation [25.736296938185074]
We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics.
We optimize over a class of nonlinear feedback policies inspired by certainty equivalent "estimate-and-cancel" control laws.
arXiv Detail & Related papers (2022-12-02T18:54:23Z) - Sample-efficient Safe Learning for Online Nonlinear Control with Control
Barrier Functions [35.9713619595494]
Reinforcement Learning and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks.
Given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks.
We propose a emphprovably efficient episodic safe learning framework for online control tasks.
arXiv Detail & Related papers (2022-07-29T00:54:35Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.