One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning
- URL: http://arxiv.org/abs/2508.01561v1
- Date: Sun, 03 Aug 2025 03:17:49 GMT
- Title: One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning
- Authors: Zijian Guo, İlker Işık, H. M. Sabbir Ahmad, Wenchao Li,
- Abstract summary: Generalizing to complex and temporally extended task objectives and safety constraints is a critical challenge in reinforcement learning (RL)<n>In this paper, we introduce GenZ-LTL, a method that enables zero-shot generalization to arbitrary specifications.
- Score: 3.5886171069912938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalizing to complex and temporally extended task objectives and safety constraints remains a critical challenge in reinforcement learning (RL). Linear temporal logic (LTL) offers a unified formalism to specify such requirements, yet existing methods are limited in their abilities to handle nested long-horizon tasks and safety constraints, and cannot identify situations when a subgoal is not satisfiable and an alternative should be sought. In this paper, we introduce GenZ-LTL, a method that enables zero-shot generalization to arbitrary LTL specifications. GenZ-LTL leverages the structure of B\"uchi automata to decompose an LTL task specification into sequences of reach-avoid subgoals. Contrary to the current state-of-the-art method that conditions on subgoal sequences, we show that it is more effective to achieve zero-shot generalization by solving these reach-avoid problems \textit{one subgoal at a time} through proper safe RL formulations. In addition, we introduce a novel subgoal-induced observation reduction technique that can mitigate the exponential complexity of subgoal-state combinations under realistic assumptions. Empirical results show that GenZ-LTL substantially outperforms existing methods in zero-shot generalization to unseen LTL specifications.
Related papers
- Toward Robust Non-Transferable Learning: A Survey and Benchmark [51.52542476904985]
Non-transferable learning (NTL) is a task aimed at reshaping the generalization abilities of deep learning models.<n>We present the first comprehensive survey on NTL and introducing NTLBench, the first benchmark to evaluate NTL performance and robustness.<n>We discuss the practical applications of NTL, along with its future directions and associated challenges.
arXiv Detail & Related papers (2025-02-19T10:12:19Z) - DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks.<n>Existing approaches suffer from several shortcomings.<n>We propose a novel learning approach to address these concerns.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.<n>We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.<n>We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Learning Minimally-Violating Continuous Control for Infeasible Linear
Temporal Logic Specifications [2.496282558123411]
This paper explores continuous-time control for target-driven navigation to satisfy complex high-level tasks expressed as linear temporal logic (LTL)
We propose a model-free synthesis framework using deep reinforcement learning (DRL) where the underlying dynamic system is unknown (an opaque box)
arXiv Detail & Related papers (2022-10-03T18:32:20Z) - Reinforcement Learning for General LTL Objectives Is Intractable [10.69663517250214]
We formalize the problem under the probably correct learning in Markov decision processes (PACMDP) framework.
Our result implies it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy.
arXiv Detail & Related papers (2021-11-24T18:26:13Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.