Model-Based Actor-Critic with Chance Constraint for Stochastic System
- URL: http://arxiv.org/abs/2012.10716v2
- Date: Tue, 16 Mar 2021 04:34:04 GMT
- Title: Model-Based Actor-Critic with Chance Constraint for Stochastic System
- Authors: Baiyu Peng, Yao Mu, Yang Guan, Shengbo Eben Li, Yuming Yin, Jianyu
Chen
- Abstract summary: We propose a model-based chance constrained actor-critic (CCAC) algorithm which can efficiently learn a safe and non-conservative policy.
CCAC directly solves the original chance constrained problems, where the objective function and safe probability is simultaneously optimized with adaptive weights.
- Score: 6.600423613245076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety is essential for reinforcement learning (RL) applied in real-world
situations. Chance constraints are suitable to represent the safety
requirements in stochastic systems. Previous chance-constrained RL methods
usually have a low convergence rate, or only learn a conservative policy. In
this paper, we propose a model-based chance constrained actor-critic (CCAC)
algorithm which can efficiently learn a safe and non-conservative policy.
Different from existing methods that optimize a conservative lower bound, CCAC
directly solves the original chance constrained problems, where the objective
function and safe probability is simultaneously optimized with adaptive
weights. In order to improve the convergence rate, CCAC utilizes the gradient
of dynamic model to accelerate policy optimization. The effectiveness of CCAC
is demonstrated by a stochastic car-following task. Experiments indicate that
compared with previous RL methods, CCAC improves the performance while
guaranteeing safety, with a five times faster convergence rate. It also has 100
times higher online computation efficiency than traditional safety techniques
such as stochastic model predictive control.
Related papers
- SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Safe and Efficient Reinforcement Learning Using
Disturbance-Observer-Based Control Barrier Functions [5.571154223075409]
This paper presents a method for safe and efficient reinforcement learning (RL) using disturbance observers (DOBs) and control barrier functions (CBFs)
Our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions.
Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning.
arXiv Detail & Related papers (2022-11-30T18:49:53Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - CUP: A Conservative Update Policy Algorithm for Safe Reinforcement
Learning [14.999515900425305]
We propose a Conservative Update Policy with a theoretical safety guarantee.
We provide rigorous theoretical analysis to extend the surrogate functions to generalized advantage (GAE)
Experiments show the effectiveness of the CUP to design safe constraints.
arXiv Detail & Related papers (2022-02-15T16:49:28Z) - Chance Constrained Policy Optimization for Process Control and
Optimization [1.4908563154226955]
Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation.
We propose a chance constrained policy optimization algorithm which guarantees the satisfaction of joint chance constraints with a high probability.
arXiv Detail & Related papers (2020-07-30T14:20:35Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.