Reinforcement learning with timed constraints for robotics motion planning
- URL: http://arxiv.org/abs/2601.00087v1
- Date: Wed, 31 Dec 2025 19:43:44 GMT
- Title: Reinforcement learning with timed constraints for robotics motion planning
- Authors: Zhaoan Wang, Junchao Li, Mahdi Mohammad, Shaoping Xiao,
- Abstract summary: This paper presents a unified automata-based framework for synthesizing policies in both Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs)<n>A simple yet expressive reward structure enforces temporal correctness while allowing additional performance objectives.<n>Results demonstrate that the proposed framework consistently learns policies that satisfy strict time-bounded requirements under transitions, scales to larger state spaces, and remains effective in partially observable environments.
- Score: 0.5436465344481877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic systems operating in dynamic and uncertain environments increasingly require planners that satisfy complex task sequences while adhering to strict temporal constraints. Metric Interval Temporal Logic (MITL) offers a formal and expressive framework for specifying such time-bounded requirements; however, integrating MITL with reinforcement learning (RL) remains challenging due to stochastic dynamics and partial observability. This paper presents a unified automata-based RL framework for synthesizing policies in both Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs) under MITL specifications. MITL formulas are translated into Timed Limit-Deterministic Generalized Büchi Automata (Timed-LDGBA) and synchronized with the underlying decision process to construct product timed models suitable for Q-learning. A simple yet expressive reward structure enforces temporal correctness while allowing additional performance objectives. The approach is validated in three simulation studies: a $5 \times 5$ grid-world formulated as an MDP, a $10 \times 10$ grid-world formulated as a POMDP, and an office-like service-robot scenario. Results demonstrate that the proposed framework consistently learns policies that satisfy strict time-bounded requirements under stochastic transitions, scales to larger state spaces, and remains effective in partially observable environments, highlighting its potential for reliable robotic planning in time-critical and uncertain settings.
Related papers
- LLM-Grounded Dynamic Task Planning with Hierarchical Temporal Logic for Human-Aware Multi-Robot Collaboration [17.886091169216538]
Large Language Models (LLM) enable non-experts to specify openworld multi-robot tasks.<n>LLM plans often lack feasibility and are not efficient, especially in long-horizon scenarios.<n>We propose a neuro-symbolic framework that grounds reasoning into hierarchical specifications.
arXiv Detail & Related papers (2026-02-10T07:11:36Z) - Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time [52.03682298194168]
This paper proposes an integration of temporal logical reasoning and Partially Observable Markov Decision Processes (POMDPs)<n>Our method leverages a fragment of Linear Temporal Logic (LTL) based on Event Calculus (EC) to generate emphpersistent (i.e., constant) macro-actions.<n>These macro-actions guide Monte Carlo Tree Search (MCTS)-based POMDP solvers over a time horizon.
arXiv Detail & Related papers (2025-05-06T16:08:55Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Constant-time Motion Planning with Anytime Refinement for Manipulation [17.543746580669662]
We propose an anytime refinement approach that works in combination with constant-time motion planners (CTMP) algorithms.
Our proposed framework, as it operates as a constant time algorithm, rapidly generates an initial solution within a user-defined time threshold.
functioning as an anytime algorithm, it iteratively refines the solution's quality within the allocated time budget.
arXiv Detail & Related papers (2023-11-01T20:40:10Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - Formal Controller Synthesis for Markov Jump Linear Systems with
Uncertain Dynamics [64.72260320446158]
We propose a method for synthesising controllers for Markov jump linear systems.
Our method is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS.
We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
arXiv Detail & Related papers (2022-12-01T17:36:30Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reinforcement Learning Based Temporal Logic Control with Soft
Constraints Using Limit-deterministic Generalized Buchi Automata [0.0]
We study the control synthesis of motion planning subject to uncertainties.
The uncertainties are considered in robot motion and environment properties, giving rise to the probabilistic labeled Markov decision process (MDP)
arXiv Detail & Related papers (2021-01-25T18:09:11Z) - Formal Controller Synthesis for Continuous-Space MDPs via Model-Free
Reinforcement Learning [1.0928470926399565]
A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed.
A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs.
We present a novel potential-based reward shaping technique to produce dense rewards to speed up learning.
arXiv Detail & Related papers (2020-03-02T08:29:36Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.