Learning Minimally-Violating Continuous Control for Infeasible Linear
Temporal Logic Specifications
- URL: http://arxiv.org/abs/2210.01162v2
- Date: Thu, 6 Oct 2022 02:46:50 GMT
- Title: Learning Minimally-Violating Continuous Control for Infeasible Linear
Temporal Logic Specifications
- Authors: Mingyu Cai, Makai Mann, Zachary Serlin, Kevin Leahy, Cristian-Ioan
Vasile
- Abstract summary: This paper explores continuous-time control for target-driven navigation to satisfy complex high-level tasks expressed as linear temporal logic (LTL)
We propose a model-free synthesis framework using deep reinforcement learning (DRL) where the underlying dynamic system is unknown (an opaque box)
- Score: 2.496282558123411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores continuous-time control synthesis for target-driven
navigation to satisfy complex high-level tasks expressed as linear temporal
logic (LTL). We propose a model-free framework using deep reinforcement
learning (DRL) where the underlying dynamic system is unknown (an opaque box).
Unlike prior work, this paper considers scenarios where the given LTL
specification might be infeasible and therefore cannot be accomplished
globally. Instead of modifying the given LTL formula, we provide a general
DRL-based approach to satisfy it with minimal violation.
%\mminline{Need to decide if we're comfortable calling these "guarantees" due
to the stochastic policy. I'm not repeating this comment everywhere that says
"guarantees" but there are multiple places.}
To do this, we transform a previously multi-objective DRL problem, which
requires simultaneous automata satisfaction and minimum violation cost, into a
single objective. By guiding the DRL agent with a sampling-based path planning
algorithm for the potentially infeasible LTL task, the proposed approach
mitigates the myopic tendencies of DRL, which are often an issue when learning
general LTL tasks that can have long or infinite horizons. This is achieved by
decomposing an infeasible LTL formula into several reach-avoid sub-tasks with
shorter horizons, which can be trained in a modular DRL architecture.
Furthermore, we overcome the challenge of the exploration process for DRL in
complex and cluttered environments by using path planners to design rewards
that are dense in the configuration space. The benefits of the presented
approach are demonstrated through testing on various complex nonlinear systems
and compared with state-of-the-art baselines. The Video demonstration can be
found on YouTube Channel:\url{https://youtu.be/jBhx6Nv224E}.
Related papers
- Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning [17.760679318994384]
We present a novel hierarchical transformer-based approach leveraging a learned quantizer of the space.
This quantization enables the training of a simpler zone-conditioned low-level policy and simplifies planning.
Our proposed approach achieves state-of-the-art results in complex long-distance navigation environments.
arXiv Detail & Related papers (2024-11-12T12:49:41Z) - DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Signal Temporal Logic Neural Predictive Control [15.540490027770621]
We propose a method to learn a neural network controller to satisfy the requirements specified in Signal temporal logic (STL)
Our controller learns to roll out trajectories to maximize the STL robustness score in training.
A backup policy is designed to ensure safety when our controller fails.
arXiv Detail & Related papers (2023-09-10T20:31:25Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Continuous Motion Planning with Temporal Logic Specifications using Deep
Neural Networks [16.296473750342464]
We propose a model-free reinforcement learning method to synthesize control policies for motion planning problems.
The robot is modelled as a discrete Markovtime decision process (MDP) with continuous state and action spaces.
We train deep neural networks to approximate the value function and policy using an actorcritic reinforcement learning method.
arXiv Detail & Related papers (2020-04-02T17:58:03Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.