Safe Pontryagin Differentiable Programming
- URL: http://arxiv.org/abs/2105.14937v1
- Date: Mon, 31 May 2021 13:03:00 GMT
- Title: Safe Pontryagin Differentiable Programming
- Authors: Wanxin Jin, Shaoshuai Mou, George J. Pappas
- Abstract summary: We propose a theoretical and algorithmic safe differentiable framework to solve a broad class of safety-critical learning and control tasks.
We demonstrate the capabilities of Safe PDP in solving various safe learning and control tasks.
- Score: 17.63374326658473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a Safe Pontryagin Differentiable Programming (Safe PDP)
methodology, which establishes a theoretical and algorithmic safe
differentiable framework to solve a broad class of safety-critical learning and
control tasks -- problems that require the guarantee of both immediate and
long-term constraint satisfaction at any stage of the learning and control
progress. In the spirit of interior-point methods, Safe PDP handles different
types of state and input constraints by incorporating them into the cost and
loss through barrier functions. We prove the following fundamental features of
Safe PDP: first, both the constrained solution and its gradient in backward
pass can be approximated by solving a more efficient unconstrained counterpart;
second, the approximation for both the solution and its gradient can be
controlled for arbitrary accuracy using a barrier parameter; and third,
importantly, any intermediate results throughout the approximation and
optimization are strictly respecting all constraints, thus guaranteeing safety
throughout the entire learning and control process. We demonstrate the
capabilities of Safe PDP in solving various safe learning and control tasks,
including safe policy optimization, safe motion planning, and learning MPCs
from demonstrations, on different challenging control systems such as 6-DoF
maneuvering quadrotor and 6-DoF rocket powered landing.
Related papers
- Pareto Control Barrier Function for Inner Safe Set Maximization Under Input Constraints [50.920465513162334]
We introduce the PCBF algorithm to maximize the inner safe set of dynamical systems under input constraints.
We validate its effectiveness through comparison with Hamilton-Jacobi reachability for an inverted pendulum and through simulations on a 12-dimensional quadrotor system.
Results show that the PCBF consistently outperforms existing methods, yielding larger safe sets and ensuring safety under input constraints.
arXiv Detail & Related papers (2024-10-05T18:45:19Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Distributionally Safe Reinforcement Learning under Model Uncertainty: A
Single-Level Approach by Differentiable Convex Programming [4.825619788907192]
We present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric.
To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space to a finite-dimensional parametric space.
By differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules.
arXiv Detail & Related papers (2023-10-03T22:05:05Z) - Safe Neural Control for Non-Affine Control Systems with Differentiable
Control Barrier Functions [58.19198103790931]
This paper addresses the problem of safety-critical control for non-affine control systems.
It has been shown that optimizing quadratic costs subject to state and control constraints can be sub-optimally reduced to a sequence of quadratic programs (QPs) by using Control Barrier Functions (CBFs)
We incorporate higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems.
arXiv Detail & Related papers (2023-09-06T05:35:48Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Sample-efficient Safe Learning for Online Nonlinear Control with Control
Barrier Functions [35.9713619595494]
Reinforcement Learning and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks.
Given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks.
We propose a emphprovably efficient episodic safe learning framework for online control tasks.
arXiv Detail & Related papers (2022-07-29T00:54:35Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.