Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
- URL: http://arxiv.org/abs/2007.03964v1
- Date: Wed, 8 Jul 2020 08:43:14 GMT
- Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
- Authors: Adam Stooke, Joshua Achiam, and Pieter Abbeel
- Abstract summary: Lagrangian methods exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior.
We propose a novel Lagrange multiplier update method that utilizes derivatives of the constraint function.
We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark.
- Score: 74.49173841304474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lagrangian methods are widely used algorithms for constrained optimization
problems, but their learning dynamics exhibit oscillations and overshoot which,
when applied to safe reinforcement learning, leads to constraint-violating
behavior during agent training. We address this shortcoming by proposing a
novel Lagrange multiplier update method that utilizes derivatives of the
constraint function. We take a controls perspective, wherein the traditional
Lagrange multiplier update behaves as \emph{integral} control; our terms
introduce \emph{proportional} and \emph{derivative} control, achieving
favorable learning dynamics through damping and predictive measures. We apply
our PID Lagrangian methods in deep RL, setting a new state of the art in Safety
Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller
tuning by providing invariance to the relative numerical scales of reward and
cost. Our extensive experiments demonstrate improved performance and
hyperparameter robustness, while our algorithms remain nearly as simple to
derive and implement as the traditional Lagrangian approach.
Related papers
- On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization [16.40968330148623]
This paper proposes the $nu$PI algorithm and contributes an optimization perspective on Lagrange multiplier updates based on PI controllers.
We provide theoretical and empirical insights explaining the inability of momentum methods to address the shortcomings of gradient descent-ascent.
We prove that $nu$PI generalizes popular momentum methods for single-objective minimization.
arXiv Detail & Related papers (2024-06-07T00:13:31Z) - One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based scenarios.
arXiv Detail & Related papers (2024-05-29T22:12:52Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Model-based Chance-Constrained Reinforcement Learning via Separated
Proportional-Integral Lagrangian [5.686699342802045]
We propose a separated proportional-integral Lagrangian algorithm to enhance RL safety under uncertainty.
We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation.
arXiv Detail & Related papers (2021-08-26T07:34:14Z) - Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications.
We propose a learning algorithm that decouples the action constraints from the policy parameter update.
We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z) - Separated Proportional-Integral Lagrangian for Chance Constrained
Reinforcement Learning [6.600423613245076]
Safety is essential for reinforcement learning applied in real-world tasks like autonomous driving.
Chance constraints which guarantee the satisfaction of state constraints at a high probability are suitable to represent the requirements.
Existing chance constrained RL methods like the penalty method and the Lagrangian method either exhibit periodic oscillations or cannot satisfy the constraints.
arXiv Detail & Related papers (2021-02-17T02:40:01Z) - Constrained Model-based Reinforcement Learning with Robust Cross-Entropy
Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations.
We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework.
The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.