Certified Reinforcement Learning with Logic Guidance
- URL: http://arxiv.org/abs/1902.00778v4
- Date: Tue, 6 Jun 2023 15:52:30 GMT
- Title: Certified Reinforcement Learning with Logic Guidance
- Authors: Hosein Hasanbeig, Daniel Kroening, Alessandro Abate
- Abstract summary: We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
- Score: 78.2286146954051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) is a widely employed machine learning
architecture that has been applied to a variety of control problems. However,
applications in safety-critical domains require a systematic and formal
approach to specifying requirements as tasks or goals. We propose a model-free
RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a
goal for unknown continuous-state/action Markov Decision Processes (MDPs). The
given LTL property is translated into a Limit-Deterministic Generalised Buchi
Automaton (LDGBA), which is then used to shape a synchronous reward function
on-the-fly. Under certain assumptions, the algorithm is guaranteed to
synthesise a control policy whose traces satisfy the LTL specification with
maximal probability.
Related papers
- Regret-Free Reinforcement Learning for LTL Specifications [6.342676126028222]
Reinforcement learning is a promising method to learn optimal control policies for systems with unknown dynamics.
Current RL-based methods offer only guarantees, which provide no insight into the transient performance during the learning phase.
We present the first regret-free online algorithm for learning a controller that addresses the general class of specifications over Markov decision processes.
arXiv Detail & Related papers (2024-11-18T20:01:45Z) - DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs)
We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z) - Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic
through Gaussian Processes and Control Barrier Functions [3.5897534810405403]
Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications.
In this paper, we propose a learning-based control framework consisting of several aspects.
We show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability.
arXiv Detail & Related papers (2021-09-07T00:51:12Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.