Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction
- URL: http://arxiv.org/abs/2010.06797v5
- Date: Tue, 5 Oct 2021 16:01:44 GMT
- Title: Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction
- Authors: Mingyu Cai, Shaoping Xiao, Baoluo Li, Zhiliang Li and Zhen Kan
- Abstract summary: This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
- Score: 5.337302350000984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a model-free reinforcement learning (RL) algorithm to
synthesize a control policy that maximizes the satisfaction probability of
linear temporal logic (LTL) specifications. Due to the consideration of
environment and motion uncertainties, we model the robot motion as a
probabilistic labeled Markov decision process with unknown transition
probabilities and unknown probabilistic label functions. The LTL task
specification is converted to a limit deterministic generalized B\"uchi
automaton (LDGBA) with several accepting sets to maintain dense rewards during
learning. The novelty of applying LDGBA is to construct an embedded LDGBA
(E-LDGBA) by designing a synchronous tracking-frontier function, which enables
the record of non-visited accepting sets without increasing dimensional and
computational complexity. With appropriate dependent reward and discount
functions, rigorous analysis shows that any method that optimizes the expected
discount return of the RL-based approach is guaranteed to find the optimal
policy that maximizes the satisfaction probability of the LTL specifications. A
model-free RL-based motion planning strategy is developed to generate the
optimal policy in this paper. The effectiveness of the RL-based control
synthesis is demonstrated via simulation and experimental results.
Related papers
- Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Sample Efficient Model-free Reinforcement Learning from LTL
Specifications with Optimality Guarantees [17.69385864791265]
We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown system.
We also provide improved theoretical results on choosing the key parameters to ensure optimality.
arXiv Detail & Related papers (2023-05-02T12:57:05Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs)
We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reinforcement Learning Based Temporal Logic Control with Soft
Constraints Using Limit-deterministic Generalized Buchi Automata [0.0]
We study the control synthesis of motion planning subject to uncertainties.
The uncertainties are considered in robot motion and environment properties, giving rise to the probabilistic labeled Markov decision process (MDP)
arXiv Detail & Related papers (2021-01-25T18:09:11Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.