Separated Proportional-Integral Lagrangian for Chance Constrained
Reinforcement Learning
- URL: http://arxiv.org/abs/2102.08539v1
- Date: Wed, 17 Feb 2021 02:40:01 GMT
- Title: Separated Proportional-Integral Lagrangian for Chance Constrained
Reinforcement Learning
- Authors: Baiyu Peng, Yao Mu, Jingliang Duan, Yang Guan, Shengbo Eben Li, Jianyu
Chen
- Abstract summary: Safety is essential for reinforcement learning applied in real-world tasks like autonomous driving.
Chance constraints which guarantee the satisfaction of state constraints at a high probability are suitable to represent the requirements.
Existing chance constrained RL methods like the penalty method and the Lagrangian method either exhibit periodic oscillations or cannot satisfy the constraints.
- Score: 6.600423613245076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety is essential for reinforcement learning (RL) applied in real-world
tasks like autonomous driving. Chance constraints which guarantee the
satisfaction of state constraints at a high probability are suitable to
represent the requirements in real-world environment with uncertainty. Existing
chance constrained RL methods like the penalty method and the Lagrangian method
either exhibit periodic oscillations or cannot satisfy the constraints. In this
paper, we address these shortcomings by proposing a separated
proportional-integral Lagrangian (SPIL) algorithm. Taking a control
perspective, we first interpret the penalty method and the Lagrangian method as
proportional feedback and integral feedback control, respectively. Then, a
proportional-integral Lagrangian method is proposed to steady learning process
while improving safety. To prevent integral overshooting and reduce
conservatism, we introduce the integral separation technique inspired by PID
control. Finally, an analytical gradient of the chance constraint is utilized
for model-based policy optimization. The effectiveness of SPIL is demonstrated
by a narrow car-following task. Experiments indicate that compared with
previous methods, SPIL improves the performance while guaranteeing safety, with
a steady learning process.
Related papers
- Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Model-based Chance-Constrained Reinforcement Learning via Separated
Proportional-Integral Lagrangian [5.686699342802045]
We propose a separated proportional-integral Lagrangian algorithm to enhance RL safety under uncertainty.
We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation.
arXiv Detail & Related papers (2021-08-26T07:34:14Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z) - Model-based Safe Reinforcement Learning using Generalized Control
Barrier Function [6.556257209888797]
This paper proposes a model-based feasibility enhancement technique of constrained RL.
By using the model information, the policy can be optimized safely without violating actual safety constraints.
The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
arXiv Detail & Related papers (2021-03-02T08:17:38Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Constrained Model-Free Reinforcement Learning for Process Optimization [0.0]
Reinforcement learning (RL) is a control approach that can handle nonlinear optimal control problems.
Despite the promise exhibited, RL has yet to see marked translation to industrial practice.
We propose an 'oracle'-assisted constrained Q-learning algorithm that guarantees the satisfaction of joint chance constraints with a high probability.
arXiv Detail & Related papers (2020-11-16T13:16:22Z) - Responsive Safety in Reinforcement Learning by PID Lagrangian Methods [74.49173841304474]
Lagrangian methods exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior.
We propose a novel Lagrange multiplier update method that utilizes derivatives of the constraint function.
We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark.
arXiv Detail & Related papers (2020-07-08T08:43:14Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z) - Safe reinforcement learning for probabilistic reachability and safety
specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation.
Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.
It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.