Related papers: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

URL: http://arxiv.org/abs/2007.03964v1
Date: Wed, 8 Jul 2020 08:43:14 GMT
Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
Authors: Adam Stooke, Joshua Achiam, and Pieter Abbeel
Abstract summary: Lagrangian methods exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior. We propose a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark.
Score: 74.49173841304474
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We take a controls perspective, wherein the traditional Lagrange multiplier update behaves as \emph{integral} control; our terms introduce \emph{proportional} and \emph{derivative} control, achieving favorable learning dynamics through damping and predictive measures. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost. Our extensive experiments demonstrate improved performance and hyperparameter robustness, while our algorithms remain nearly as simple to derive and implement as the traditional Lagrangian approach.

Related papers

Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Predictive Lagrangian Optimization for Constrained Reinforcement Learning [15.082498910832529]
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system.
arXiv Detail & Related papers (2025-01-25T13:39:45Z)
On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization [16.40968330148623]
This paper proposes the $nu$PI algorithm and contributes an optimization perspective on Lagrange multiplier updates based on PI controllers. We provide theoretical and empirical insights explaining the inability of momentum methods to address the shortcomings of gradient descent-ascent. We prove that $nu$PI generalizes popular momentum methods for single-objective minimization.
arXiv Detail & Related papers (2024-06-07T00:13:31Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian [5.686699342802045]
We propose a separated proportional-integral Lagrangian algorithm to enhance RL safety under uncertainty. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation.
arXiv Detail & Related papers (2021-08-26T07:34:14Z)
Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications. We propose a learning algorithm that decouples the action constraints from the policy parameter update. We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z)
Separated Proportional-Integral Lagrangian for Chance Constrained Reinforcement Learning [6.600423613245076]
Safety is essential for reinforcement learning applied in real-world tasks like autonomous driving. Chance constraints which guarantee the satisfaction of state constraints at a high probability are suitable to represent the requirements. Existing chance constrained RL methods like the penalty method and the Lagrangian method either exhibit periodic oscillations or cannot satisfy the constraints.
arXiv Detail & Related papers (2021-02-17T02:40:01Z)
Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations. We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework. The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z)
Reinforcement Learning with Fast Stabilization in Linear Dynamical Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems. We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment. We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z)
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.