Reinforcement Learning Based Temporal Logic Control with Soft
Constraints Using Limit-deterministic Generalized Buchi Automata
- URL: http://arxiv.org/abs/2101.10284v2
- Date: Sun, 31 Jan 2021 18:16:45 GMT
- Title: Reinforcement Learning Based Temporal Logic Control with Soft
Constraints Using Limit-deterministic Generalized Buchi Automata
- Authors: Mingyu Cai, Shaoping Xiao, and Zhen Kan
- Abstract summary: We study the control synthesis of motion planning subject to uncertainties.
The uncertainties are considered in robot motion and environment properties, giving rise to the probabilistic labeled Markov decision process (MDP)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the control synthesis of motion planning subject to
uncertainties. The uncertainties are considered in robot motion and environment
properties, giving rise to the probabilistic labeled Markov decision process
(MDP). A model-free reinforcement learning (RL) is developed to generate a
finite-memory control policy to satisfy high-level tasks expressed in linear
temporal logic (LTL) formulas. One of the novelties is to translate LTL into a
limit deterministic generalized B\"uchi automaton (LDGBA) and develop a
corresponding embedded LDGBA (E-LDGBA) by incorporating a tracking-frontier
function to overcome the issue of sparse accepting rewards, resulting in
improved learning performance without increasing computational complexity. Due
to potentially conflicting tasks, a relaxed product MDP is developed to allow
the agent to revise its motion plan without strictly following the desired LTL
constraints if the desired tasks can only be partially fulfilled. An expected
return composed of violation rewards and accepting rewards is developed. The
designed violation function quantifies the differences between the revised and
the desired motion planning, while the accepting rewards are designed to
enforce the satisfaction of the acceptance condition of the relaxed product
MDP. Rigorous analysis shows that any RL algorithm that optimizes the expected
return is guaranteed to find policies that, in decreasing order, can 1) satisfy
acceptance condition of relaxed product MDP and 2) reduce the violation cost
over long-term behaviors. Also, we validate the control synthesis approach via
simulation and experimental results.
Related papers
- Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs)
We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - Stochastic Finite State Control of POMDPs with LTL Specifications [14.163899014007647]
Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty.
This paper considers the quantitative problem of synthesizing sub-optimal finite state controllers (sFSCs) for POMDPs.
We propose a bounded policy algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations.
arXiv Detail & Related papers (2020-01-21T18:10:47Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.