Related papers: Model-based Safe Reinforcement Learning using Generalized Control Barrier Function

Model-based Safe Reinforcement Learning using Generalized Control Barrier Function

URL: http://arxiv.org/abs/2103.01556v1
Date: Tue, 2 Mar 2021 08:17:38 GMT
Title: Model-based Safe Reinforcement Learning using Generalized Control Barrier Function
Authors: Haitong Ma, Jianyu Chen, Shengbo Eben Li, Ziyu Lin, Sifa Zheng
Abstract summary: This paper proposes a model-based feasibility enhancement technique of constrained RL. By using the model information, the policy can be optimized safely without violating actual safety constraints. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
Score: 6.556257209888797
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-free constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.

Related papers

Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap. We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z)
Leveraging Constraint Violation Signals For Action-Constrained Reinforcement Learning [13.332006760984122]
Action-Constrained Reinforcement Learning (ACRL) employs a projection layer after the policy network to correct the action. Recent methods were proposed to train generative models to learn a differentiable mapping between latent variables and feasible actions.
arXiv Detail & Related papers (2025-02-08T12:58:26Z)
Diffusion Predictive Control with Constraints [51.91057765703533]
Diffusion predictive control with constraints (DPCC) An algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
arXiv Detail & Related papers (2024-12-12T15:10:22Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
Sim-to-Real Transfer of Adaptive Control Parameters for AUV Stabilization under Current Disturbance [1.099532646524593]
This paper presents a novel approach, merging the Maximum Entropy Deep Reinforcement Learning framework with a classic model-based control architecture, to formulate an adaptive controller. Within this framework, we introduce a Sim-to-Real transfer strategy comprising the following components: a bio-inspired experience replay mechanism, an enhanced domain randomisation technique, and an evaluation protocol executed on a physical platform. Our experimental assessments demonstrate that this method effectively learns proficient policies from suboptimal simulated models of the AUV, resulting in control performance 3 times higher when transferred to a real-world vehicle.
arXiv Detail & Related papers (2023-10-17T08:46:56Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z)
Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian [5.686699342802045]
We propose a separated proportional-integral Lagrangian algorithm to enhance RL safety under uncertainty. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation.
arXiv Detail & Related papers (2021-08-26T07:34:14Z)
Safe Continuous Control with Constrained Model-Based Policy Optimization [0.0]
We introduce a model-based safe exploration algorithm for constrained high-dimensional control. We also introduce a practical algorithm that accelerates policy search with model-generated data.
arXiv Detail & Related papers (2021-04-14T15:20:55Z)
Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations. We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework. The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z)
Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions [96.63967125746747]
Reinforcement learning framework learns the model uncertainty present in the CBF and CLF constraints. RL-CBF-CLF-QP addresses the problem of model uncertainty in the safety constraints.
arXiv Detail & Related papers (2020-04-16T10:51:33Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.