Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic
- URL: http://arxiv.org/abs/2102.12855v1
- Date: Wed, 24 Feb 2021 01:11:25 GMT
- Title: Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic
- Authors: Mingyu Cai, Mohammadhosein Hasanbeig, Shaoping Xiao, Alessandro Abate
and Zhen Kan
- Abstract summary: This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
- Score: 59.94347858883343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the motion planning of autonomous dynamical systems
modeled by Markov decision processes (MDP) with unknown transition
probabilities over continuous state and action spaces. Linear temporal logic
(LTL) is used to specify high-level tasks over infinite horizon, which can be
converted into a limit deterministic generalized B\"uchi automaton (LDGBA) with
several accepting sets. The novelty is to design an embedded product MDP
(EP-MDP) between the LDGBA and the MDP by incorporating a synchronous
tracking-frontier function to record unvisited accepting sets of the automaton,
and to facilitate the satisfaction of the accepting conditions. The proposed
LDGBA-based reward shaping and discounting schemes for the model-free
reinforcement learning (RL) only depend on the EP-MDP states and can overcome
the issues of sparse rewards. Rigorous analysis shows that any RL method that
optimizes the expected discounted return is guaranteed to find an optimal
policy whose traces maximize the satisfaction probability. A modular deep
deterministic policy gradient (DDPG) is then developed to generate such
policies over continuous state and action spaces. The performance of our
framework is evaluated via an array of OpenAI gym environments.
Related papers
- Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP)
We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Model-Free Reinforcement Learning for Optimal Control of MarkovDecision
Processes Under Signal Temporal Logic Specifications [7.842869080999489]
We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process.
We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.
arXiv Detail & Related papers (2021-09-27T22:44:55Z) - Reinforcement Learning Based Temporal Logic Control with Soft
Constraints Using Limit-deterministic Generalized Buchi Automata [0.0]
We study the control synthesis of motion planning subject to uncertainties.
The uncertainties are considered in robot motion and environment properties, giving rise to the probabilistic labeled Markov decision process (MDP)
arXiv Detail & Related papers (2021-01-25T18:09:11Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Formal Controller Synthesis for Continuous-Space MDPs via Model-Free
Reinforcement Learning [1.0928470926399565]
A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed.
A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs.
We present a novel potential-based reward shaping technique to produce dense rewards to speed up learning.
arXiv Detail & Related papers (2020-03-02T08:29:36Z) - Stochastic Finite State Control of POMDPs with LTL Specifications [14.163899014007647]
Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty.
This paper considers the quantitative problem of synthesizing sub-optimal finite state controllers (sFSCs) for POMDPs.
We propose a bounded policy algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations.
arXiv Detail & Related papers (2020-01-21T18:10:47Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.