FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural
Network-Based Optimize
- URL: http://arxiv.org/abs/2006.11419v4
- Date: Wed, 5 May 2021 23:42:55 GMT
- Title: FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural
Network-Based Optimize
- Authors: Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How
- Abstract summary: We take constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics.
Because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable.
- Score: 44.65622657676026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates reinforcement learning with constraints, which are
indispensable in safety-critical environments. To drive the constraint
violation monotonically decrease, we take the constraints as Lyapunov functions
and impose new linear constraints on the policy parameters' updating dynamics.
As a result, the original safety set can be forward-invariant. However, because
the new guaranteed-feasible constraints are imposed on the updating dynamics
instead of the original policy parameters, classic optimization algorithms are
no longer applicable. To address this, we propose to learn a generic deep
neural network (DNN)-based optimizer to optimize the objective while satisfying
the linear constraints. The constraint-satisfaction is achieved via projection
onto a polytope formulated by multiple linear inequality constraints, which can
be solved analytically with our newly designed metric. To the best of our
knowledge, this is the \textit{first} DNN-based optimizer for constrained
optimization with the forward invariance guarantee. We show that our optimizer
trains a policy to decrease the constraint violation and maximize the
cumulative reward monotonically. Results on numerical constrained optimization
and obstacle-avoidance navigation validate the theoretical findings.
Related papers
- Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - A Surrogate Objective Framework for Prediction+Optimization with Soft
Constraints [29.962390392493507]
Decision-focused prediction approaches, such as SPO+ and direct optimization, have been proposed to fill this gap.
This paper proposes a novel analytically differentiable surrogate objective framework for real-world linear and semi-definite negative quadratic programming problems.
arXiv Detail & Related papers (2021-11-22T17:09:57Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - Nonconvex sparse regularization for deep neural networks and its
optimality [1.9798034349981162]
Deep neural network (DNN) estimators can attain optimal convergence rates for regression and classification problems.
We propose a novel penalized estimation method for sparse DNNs.
We prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems.
arXiv Detail & Related papers (2020-03-26T07:15:28Z) - Neural Networks for Encoding Dynamic Security-Constrained Optimal Power
Flow [0.0]
This paper introduces a framework to capture previously intractable optimization constraints and transform them to a mixed-integer linear program.
We demonstrate our approach for power system operation considering N-1 security and small-signal stability, showing how it can efficiently obtain cost-optimal solutions.
arXiv Detail & Related papers (2020-03-17T21:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.