Iterative Reachability Estimation for Safe Reinforcement Learning
- URL: http://arxiv.org/abs/2309.13528v1
- Date: Sun, 24 Sep 2023 02:36:42 GMT
- Title: Iterative Reachability Estimation for Safe Reinforcement Learning
- Authors: Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, Sicun Gao
- Abstract summary: We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained reinforcement learning (RL) environments.
In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety.
We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo.
- Score: 23.942701020636882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensuring safety is important for the practical deployment of reinforcement
learning (RL). Various challenges must be addressed, such as handling
stochasticity in the environments, providing rigorous guarantees of persistent
state-wise safety satisfaction, and avoiding overly conservative behaviors that
sacrifice performance. We propose a new framework, Reachability Estimation for
Safe Policy Optimization (RESPO), for safety-constrained RL in general
stochastic settings. In the feasible set where there exist violation-free
policies, we optimize for rewards while maintaining persistent safety. Outside
this feasible set, our optimization produces the safest behavior by
guaranteeing entrance into the feasible set whenever possible with the least
cumulative discounted violations. We introduce a class of algorithms using our
novel reachability estimation function to optimize in our proposed framework
and in similar frameworks such as those concurrently handling multiple hard and
soft constraints. We theoretically establish that our algorithms almost surely
converge to locally optimal policies of our safe optimization framework. We
evaluate the proposed methods on a diverse suite of safe RL environments from
Safety Gym, PyBullet, and MuJoCo, and show the benefits in improving both
reward performance and safety compared with state-of-the-art baselines.
Related papers
- Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation [26.244121960815907]
Managing the trade-off between reward and safety during exploration presents a significant challenge.
In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation.
Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.
arXiv Detail & Related papers (2024-05-02T19:07:14Z) - Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning [4.14360329494344]
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades.
Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety.
Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process.
We propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment.
arXiv Detail & Related papers (2024-02-24T20:01:15Z) - Leveraging Approximate Model-based Shielding for Probabilistic Safety
Guarantees in Continuous Environments [63.053364805943026]
We extend the approximate model-based shielding framework to the continuous setting.
In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms.
arXiv Detail & Related papers (2024-02-01T17:55:08Z) - Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning [33.988698754176646]
We introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules.
Our experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance.
This makes our approach suitable for real-world dynamic applications.
arXiv Detail & Related papers (2023-10-05T17:39:02Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Constrained Variational Policy Optimization for Safe Reinforcement
Learning [40.38842532850959]
Safe reinforcement learning aims to learn policies that satisfy certain constraints before deploying to safety-critical applications.
primal-dual as a prevalent constrained optimization framework suffers from instability issues and lacks optimality guarantees.
This paper overcomes the issues from a novel probabilistic inference perspective and proposes an Expectation-Maximization style approach to learn safe policy.
arXiv Detail & Related papers (2022-01-28T04:24:09Z) - Constrained Policy Optimization via Bayesian World Models [79.0077602277004]
LAMBDA is a model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes.
We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
arXiv Detail & Related papers (2022-01-24T17:02:22Z) - Chance Constrained Policy Optimization for Process Control and
Optimization [1.4908563154226955]
Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation.
We propose a chance constrained policy optimization algorithm which guarantees the satisfaction of joint chance constraints with a high probability.
arXiv Detail & Related papers (2020-07-30T14:20:35Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.