Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic
through Gaussian Processes and Control Barrier Functions
- URL: http://arxiv.org/abs/2109.02791v1
- Date: Tue, 7 Sep 2021 00:51:12 GMT
- Title: Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic
through Gaussian Processes and Control Barrier Functions
- Authors: Mingyu Cai, Cristian-Ioan Vasile
- Abstract summary: Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications.
In this paper, we propose a learning-based control framework consisting of several aspects.
We show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability.
- Score: 3.5897534810405403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is a promising approach and has limited success
towards real-world applications, because ensuring safe exploration or
facilitating adequate exploitation is a challenges for controlling robotic
systems with unknown models and measurement uncertainties. Such a learning
problem becomes even more intractable for complex tasks over continuous space
(state-space and action-space). In this paper, we propose a learning-based
control framework consisting of several aspects: (1) linear temporal logic
(LTL) is leveraged to facilitate complex tasks over an infinite horizons which
can be translated to a novel automaton structure; (2) we propose an innovative
reward scheme for RL-agent with the formal guarantee such that global optimal
policies maximize the probability of satisfying the LTL specifications; (3)
based on a reward shaping technique, we develop a modular policy-gradient
architecture utilizing the benefits of automaton structures to decompose
overall tasks and facilitate the performance of learned controllers; (4) by
incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic
systems, we synthesize a model-based safeguard using Exponential Control
Barrier Functions (ECBFs) to address problems with high-order relative degrees.
In addition, we utilize the properties of LTL automatons and ECBFs to construct
a guiding process to further improve the efficiency of exploration. Finally, we
demonstrate the effectiveness of the framework via several robotic
environments. And we show such an ECBF-based modular deep RL algorithm achieves
near-perfect success rates and guard safety with a high probability confidence
during training.
Related papers
- Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability.
We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Sample-efficient Safe Learning for Online Nonlinear Control with Control
Barrier Functions [35.9713619595494]
Reinforcement Learning and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks.
Given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks.
We propose a emphprovably efficient episodic safe learning framework for online control tasks.
arXiv Detail & Related papers (2022-07-29T00:54:35Z) - Constrained Reinforcement Learning for Robotics via Scenario-Based
Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior.
This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop.
Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z) - Safe RAN control: A Symbolic Reinforcement Learning Approach [62.997667081978825]
We present a Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications.
We provide a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology.
We introduce a user interface (UI) developed to help a user set intent specifications to the system, and inspect the difference in agent proposed actions.
arXiv Detail & Related papers (2021-06-03T16:45:40Z) - Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function.
We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.