Related papers: FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize

FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize

URL: http://arxiv.org/abs/2006.11419v4
Date: Wed, 5 May 2021 23:42:55 GMT
Title: FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize
Authors: Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How
Abstract summary: We take constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. Because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable.
Score: 44.65622657676026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a generic deep neural network (DNN)-based optimizer to optimize the objective while satisfying the linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. To the best of our knowledge, this is the \textit{first} DNN-based optimizer for constrained optimization with the forward invariance guarantee. We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically. Results on numerical constrained optimization and obstacle-avoidance navigation validate the theoretical findings.

Related papers

Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Differentiable Projection-based Learn to Optimize in Wireless Network-Part I: Convex Constrained (Non-)Convex Programming [15.689556794350674]
This paper addresses a class of (non-)feasible optimization problems subject to general convex constraints. Traditional convex optimization methods often struggle to efficiently handle these problems in their most general form.
arXiv Detail & Related papers (2025-01-29T11:52:27Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction. We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z)
Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting. We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem. We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z)
Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z)
A Surrogate Objective Framework for Prediction+Optimization with Soft Constraints [29.962390392493507]
Decision-focused prediction approaches, such as SPO+ and direct optimization, have been proposed to fill this gap. This paper proposes a novel analytically differentiable surrogate objective framework for real-world linear and semi-definite negative quadratic programming problems.
arXiv Detail & Related papers (2021-11-22T17:09:57Z)
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs) We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO) PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z)
Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control. From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly. We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z)
Nonconvex sparse regularization for deep neural networks and its optimality [1.9798034349981162]
Deep neural network (DNN) estimators can attain optimal convergence rates for regression and classification problems. We propose a novel penalized estimation method for sparse DNNs. We prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems.
arXiv Detail & Related papers (2020-03-26T07:15:28Z)
Neural Networks for Encoding Dynamic Security-Constrained Optimal Power Flow [0.0]
This paper introduces a framework to capture previously intractable optimization constraints and transform them to a mixed-integer linear program. We demonstrate our approach for power system operation considering N-1 security and small-signal stability, showing how it can efficiently obtain cost-optimal solutions.
arXiv Detail & Related papers (2020-03-17T21:01:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.