Guided Safe Shooting: model based reinforcement learning with safety
constraints
- URL: http://arxiv.org/abs/2206.09743v1
- Date: Mon, 20 Jun 2022 12:46:35 GMT
- Title: Guided Safe Shooting: model based reinforcement learning with safety
constraints
- Authors: Giuseppe Paolo and Jonas Gonzalez-Billandon and Albert Thomas and
Bal\'azs K\'egl
- Abstract summary: We introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints.
We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm.
- Score: 4.431335899583956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the last decade, reinforcement learning successfully solved complex
control tasks and decision-making problems, like the Go board game. Yet, there
are few success stories when it comes to deploying those algorithms to
real-world scenarios. One of the reasons is the lack of guarantees when dealing
with and avoiding unsafe states, a fundamental requirement in critical control
engineering systems. In this paper, we introduce Guided Safe Shooting (GuSS), a
model-based RL approach that can learn to control systems with minimal
violations of the safety constraints. The model is learned on the data
collected during the operation of the system in an iterated batch fashion, and
is then used to plan for the best action to perform at each time step. We
propose three different safe planners, one based on a simple random shooting
strategy and two based on MAP-Elites, a more advanced divergent-search
algorithm. Experiments show that these planners help the learning agent avoid
unsafe situations while maximally exploring the state space, a necessary aspect
when learning an accurate model of the system. Furthermore, compared to
model-free approaches, learning a model allows GuSS reducing the number of
interactions with the real-system while still reaching high rewards, a
fundamental requirement when handling engineering systems.
Related papers
- Reinforcement Learning with Ensemble Model Predictive Safety
Certification [2.658598582858331]
unsupervised exploration prevents the deployment of reinforcement learning algorithms on safety-critical tasks.
We propose a new algorithm that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent.
Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.
arXiv Detail & Related papers (2024-02-06T17:42:39Z) - Hierarchical Framework for Interpretable and Probabilistic Model-Based
Safe Reinforcement Learning [1.3678669691302048]
This paper proposes a novel approach for the use of deep reinforcement learning in safety-critical systems.
It combines the advantages of probabilistic modeling and reinforcement learning with the added benefits of interpretability.
arXiv Detail & Related papers (2023-10-28T20:30:57Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z) - Towards Safe Continuing Task Reinforcement Learning [21.390201009230246]
We propose an algorithm capable of operating in the continuing task setting without the need of restarts.
We evaluate our approach in a numerical example, which shows the capabilities of the proposed approach in learning safe policies via safe exploration.
arXiv Detail & Related papers (2021-02-24T22:12:25Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.