Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines
- URL: http://arxiv.org/abs/2510.27329v1
- Date: Fri, 31 Oct 2025 10:00:57 GMT
- Title: Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines
- Authors: Kristina Levina, Nikolaos Pappas, Athanasios Karapantelakis, Aneta Vulgarakis Feljan, Jendrik Seipp,
- Abstract summary: Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment.<n>Learning with RMs is ill-suited for long-horizon problems in which a set of subtasks can be executed in any order.<n>We introduce three generalisations of RMs: (1) RMs allow users to express complex tasks in a compact form; (2) Agenda RMs are associated with an agenda that tracks the remaining subtasks; and (3) coupled RMs have coupled states associated with each subtask in the agenda.
- Score: 6.644469604216879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents with access to RMs can learn more efficiently from fewer samples. However, learning with RMs is ill-suited for long-horizon problems in which a set of subtasks can be executed in any order. In such cases, the amount of information to learn increases exponentially with the number of unordered subtasks. In this work, we address this limitation by introducing three generalisations of RMs: (1) Numeric RMs allow users to express complex tasks in a compact form. (2) In Agenda RMs, states are associated with an agenda that tracks the remaining subtasks to complete. (3) Coupled RMs have coupled states associated with each subtask in the agenda. Furthermore, we introduce a new compositional learning algorithm that leverages coupled RMs: Q-learning with coupled RMs (CoRM). Our experiments show that CoRM scales better than state-of-the-art RM algorithms for long-horizon problems with unordered subtasks.
Related papers
- FORM: Learning Expressive and Transferable First-Order Logic Reward Machines [48.36822060760614]
Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning.<n>We propose First-Order Reward Machines ($texttFORM$s), which use first-order logic to label edges.<n>We show that $texttFORM$s can be effectively learnt for tasks where traditional RM learning approaches fail.
arXiv Detail & Related papers (2024-12-31T09:31:15Z) - LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits [73.26238057915583]
We introduce LASeR, which frames reward model selection as a multi-armed bandit problem.<n>We show that LASeR boosts iterative training, improving the absolute average accuracy of Llama-3-8B over three datasets.<n>We also show that LASeR leads to a 72.69% AlpacaEval win rate over the RM score ensemble baseline.
arXiv Detail & Related papers (2024-10-02T16:46:38Z) - Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines [5.600971575680638]
We study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs)
We present Multi-Agent Reinforcement Learning with a hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios.
Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
arXiv Detail & Related papers (2024-03-08T06:38:22Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine.
We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.