Related papers: Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications

Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications

URL: http://arxiv.org/abs/2210.01162v2
Date: Thu, 6 Oct 2022 02:46:50 GMT
Title: Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications
Authors: Mingyu Cai, Makai Mann, Zachary Serlin, Kevin Leahy, Cristian-Ioan Vasile
Abstract summary: This paper explores continuous-time control for target-driven navigation to satisfy complex high-level tasks expressed as linear temporal logic (LTL) We propose a model-free synthesis framework using deep reinforcement learning (DRL) where the underlying dynamic system is unknown (an opaque box)
Score: 2.496282558123411
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores continuous-time control synthesis for target-driven navigation to satisfy complex high-level tasks expressed as linear temporal logic (LTL). We propose a model-free framework using deep reinforcement learning (DRL) where the underlying dynamic system is unknown (an opaque box). Unlike prior work, this paper considers scenarios where the given LTL specification might be infeasible and therefore cannot be accomplished globally. Instead of modifying the given LTL formula, we provide a general DRL-based approach to satisfy it with minimal violation. %\mminline{Need to decide if we're comfortable calling these "guarantees" due to the stochastic policy. I'm not repeating this comment everywhere that says "guarantees" but there are multiple places.} To do this, we transform a previously multi-objective DRL problem, which requires simultaneous automata satisfaction and minimum violation cost, into a single objective. By guiding the DRL agent with a sampling-based path planning algorithm for the potentially infeasible LTL task, the proposed approach mitigates the myopic tendencies of DRL, which are often an issue when learning general LTL tasks that can have long or infinite horizons. This is achieved by decomposing an infeasible LTL formula into several reach-avoid sub-tasks with shorter horizons, which can be trained in a modular DRL architecture. Furthermore, we overcome the challenge of the exploration process for DRL in complex and cluttered environments by using path planners to design rewards that are dense in the configuration space. The benefits of the presented approach are demonstrated through testing on various complex nonlinear systems and compared with state-of-the-art baselines. The Video demonstration can be found on YouTube Channel:\url{https://youtu.be/jBhx6Nv224E}.

Related papers

RL for Reasoning by Adaptively Revealing Rationales [36.50924054394857]
Supervised fine-tuning (SFT) relies on dense ground-truth labels, which become increasingly costly as sequence length grows.<n>We address this by adaptive backtracking (AdaBack), a per-sample curriculum learning algorithm that reveals only a partial prefix of the target output during training.<n>We show that our adaptive curriculum over partial answers reliably solves problems that are otherwise intractable.
arXiv Detail & Related papers (2025-06-22T17:46:14Z)
SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought [78.53885607559958]
A novel approach using vision language models (VLMs) is proposed for enabling path planning in complex wireless-aware environments. To this end, insights from a digital twin with real-world wireless ray tracing data are explored. Results show that SCoTT achieves very close average path gains compared to DP-WA* while at the same time yielding consistently shorter path lengths.
arXiv Detail & Related papers (2024-11-27T10:45:49Z)
Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning [17.760679318994384]
We present a novel hierarchical transformer-based approach leveraging a learned quantizer of the space. This quantization enables the training of a simpler zone-conditioned low-level policy and simplifies planning. Our proposed approach achieves state-of-the-art results in complex long-distance navigation environments.
arXiv Detail & Related papers (2024-11-12T12:49:41Z)
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL) Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z)
Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning. We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging. We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
Signal Temporal Logic Neural Predictive Control [15.540490027770621]
We propose a method to learn a neural network controller to satisfy the requirements specified in Signal temporal logic (STL) Our controller learns to roll out trajectories to maximize the STL robustness score in training. A backup policy is designed to ensure safety when our controller fails.
arXiv Detail & Related papers (2023-09-10T20:31:25Z)
LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs) We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z)
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning. In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)
Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z)
Continuous Motion Planning with Temporal Logic Specifications using Deep Neural Networks [16.296473750342464]
We propose a model-free reinforcement learning method to synthesize control policies for motion planning problems. The robot is modelled as a discrete Markovtime decision process (MDP) with continuous state and action spaces. We train deep neural networks to approximate the value function and policy using an actorcritic reinforcement learning method.
arXiv Detail & Related papers (2020-04-02T17:58:03Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.