Safe Continuous Control with Constrained Model-Based Policy Optimization
- URL: http://arxiv.org/abs/2104.06922v1
- Date: Wed, 14 Apr 2021 15:20:55 GMT
- Title: Safe Continuous Control with Constrained Model-Based Policy Optimization
- Authors: Moritz A. Zanger, Karam Daaboul, J. Marius Z\"ollner
- Abstract summary: We introduce a model-based safe exploration algorithm for constrained high-dimensional control.
We also introduce a practical algorithm that accelerates policy search with model-generated data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The applicability of reinforcement learning (RL) algorithms in real-world
domains often requires adherence to safety constraints, a need difficult to
address given the asymptotic nature of the classic RL optimization objective.
In contrast to the traditional RL objective, safe exploration considers the
maximization of expected returns under safety constraints expressed in expected
cost returns. We introduce a model-based safe exploration algorithm for
constrained high-dimensional control to address the often prohibitively high
sample complexity of model-free safe exploration algorithms. Further, we
provide theoretical and empirical analyses regarding the implications of
model-usage on constrained policy optimization problems and introduce a
practical algorithm that accelerates policy search with model-generated data.
The need for accurate estimates of a policy's constraint satisfaction is in
conflict with accumulating model-errors. We address this issue by quantifying
model-uncertainty as the expected Kullback-Leibler divergence between
predictions of an ensemble of probabilistic dynamics models and constrain this
error-measure, resulting in an adaptive resampling scheme and dynamically
limited rollout horizons. We evaluate this approach on several simulated
constrained robot locomotion tasks with high-dimensional action- and
state-spaces. Our empirical studies find that our algorithm reaches model-free
performances with a 10-20 fold reduction of training samples while maintaining
approximate constraint satisfaction levels of model-free methods.
Related papers
- Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Model-based Safe Reinforcement Learning using Generalized Control
Barrier Function [6.556257209888797]
This paper proposes a model-based feasibility enhancement technique of constrained RL.
By using the model information, the policy can be optimized safely without violating actual safety constraints.
The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
arXiv Detail & Related papers (2021-03-02T08:17:38Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Constrained Model-based Reinforcement Learning with Robust Cross-Entropy
Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations.
We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework.
The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.