Model-based Safe Reinforcement Learning using Generalized Control
Barrier Function
- URL: http://arxiv.org/abs/2103.01556v1
- Date: Tue, 2 Mar 2021 08:17:38 GMT
- Title: Model-based Safe Reinforcement Learning using Generalized Control
Barrier Function
- Authors: Haitong Ma, Jianyu Chen, Shengbo Eben Li, Ziyu Lin, Sifa Zheng
- Abstract summary: This paper proposes a model-based feasibility enhancement technique of constrained RL.
By using the model information, the policy can be optimized safely without violating actual safety constraints.
The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
- Score: 6.556257209888797
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Model information can be used to predict future trajectories, so it has huge
potential to avoid dangerous region when implementing reinforcement learning
(RL) on real-world tasks, like autonomous driving. However, existing studies
mostly use model-free constrained RL, which causes inevitable constraint
violations. This paper proposes a model-based feasibility enhancement technique
of constrained RL, which enhances the feasibility of policy using generalized
control barrier function (GCBF) defined on the distance to constraint boundary.
By using the model information, the policy can be optimized safely without
violating actual safety constraints, and the sample efficiency is increased.
The major difficulty of infeasibility in solving the constrained policy
gradient is handled by an adaptive coefficient mechanism. We evaluate the
proposed method in both simulations and real vehicle experiments in a complex
autonomous driving collision avoidance task. The proposed method achieves up to
four times fewer constraint violations and converges 3.36 times faster than
baseline constrained RL approaches.
Related papers
- Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Sim-to-Real Transfer of Adaptive Control Parameters for AUV
Stabilization under Current Disturbance [1.099532646524593]
This paper presents a novel approach, merging the Maximum Entropy Deep Reinforcement Learning framework with a classic model-based control architecture, to formulate an adaptive controller.
Within this framework, we introduce a Sim-to-Real transfer strategy comprising the following components: a bio-inspired experience replay mechanism, an enhanced domain randomisation technique, and an evaluation protocol executed on a physical platform.
Our experimental assessments demonstrate that this method effectively learns proficient policies from suboptimal simulated models of the AUV, resulting in control performance 3 times higher when transferred to a real-world vehicle.
arXiv Detail & Related papers (2023-10-17T08:46:56Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Model-based Chance-Constrained Reinforcement Learning via Separated
Proportional-Integral Lagrangian [5.686699342802045]
We propose a separated proportional-integral Lagrangian algorithm to enhance RL safety under uncertainty.
We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation.
arXiv Detail & Related papers (2021-08-26T07:34:14Z) - Safe Continuous Control with Constrained Model-Based Policy Optimization [0.0]
We introduce a model-based safe exploration algorithm for constrained high-dimensional control.
We also introduce a practical algorithm that accelerates policy search with model-generated data.
arXiv Detail & Related papers (2021-04-14T15:20:55Z) - Constrained Model-based Reinforcement Learning with Robust Cross-Entropy
Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations.
We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework.
The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z) - Reinforcement Learning for Safety-Critical Control under Model
Uncertainty, using Control Lyapunov Functions and Control Barrier Functions [96.63967125746747]
Reinforcement learning framework learns the model uncertainty present in the CBF and CLF constraints.
RL-CBF-CLF-QP addresses the problem of model uncertainty in the safety constraints.
arXiv Detail & Related papers (2020-04-16T10:51:33Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.