Fugu-MT 論文翻訳(概要): Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

論文の概要: Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

arxiv url: http://arxiv.org/abs/2510.27329v1
Date: Fri, 31 Oct 2025 10:00:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:16.059351
Title: Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines
Title（参考訳）: 長軸非順序タスクの強化学習:ブールから結合リワードマシンへ
Authors: Kristina Levina, Nikolaos Pappas, Athanasios Karapantelakis, Aneta Vulgarakis Feljan, Jendrik Seipp,
Abstract要約: リワードマシン(RM)は、環境の報酬構造について強化学習エージェントに通知する。 RMを用いた学習は、サブタスクのセットを任意の順序で実行できるような長期水平問題に不適である。本稿では,(1)複雑なタスクをコンパクトな形式で表現できるRM,(2)Agenda RMは残りのサブタスクを追跡するアジェンダに関連付けられ,(3)結合されたRMは各サブタスクに関連付けられた状態を持つ。
参考スコア（独自算出の注目度）: 6.644469604216879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents with access to RMs can learn more efficiently from fewer samples. However, learning with RMs is ill-suited for long-horizon problems in which a set of subtasks can be executed in any order. In such cases, the amount of information to learn increases exponentially with the number of unordered subtasks. In this work, we address this limitation by introducing three generalisations of RMs: (1) Numeric RMs allow users to express complex tasks in a compact form. (2) In Agenda RMs, states are associated with an agenda that tracks the remaining subtasks to complete. (3) Coupled RMs have coupled states associated with each subtask in the agenda. Furthermore, we introduce a new compositional learning algorithm that leverages coupled RMs: Q-learning with coupled RMs (CoRM). Our experiments show that CoRM scales better than state-of-the-art RM algorithms for long-horizon problems with unordered subtasks.
Abstract（参考訳）: リワードマシン(RM)は、環境の報酬構造について強化学習エージェントに通知する。これは、RMにアクセスするエージェントがより少ないサンプルからより効率的に学習できるため、複雑な非マルコフタスクにとって特に有利である。しかし、RMを用いた学習は、サブタスクのセットを任意の順序で実行できるような長期水平問題に不適である。このような場合、未順序サブタスクの数に応じて学習する情報の量は指数関数的に増加する。本稿では, RMの3つの一般化を導入することにより, この制限に対処する。(1) 数値RMは, 複雑なタスクをコンパクトな形式で表現することを可能にする。 2)アジェンダRMでは、状態は残りのサブタスクの完了を追跡するアジェンダと関連付けられている。 (3)結合RMは、アジェンダ内の各サブタスクに関連付けられた結合状態を有する。さらに,複合RMを用いたQラーニング(Q-learning with coupled RMs)という,複合RMを活用した新しい構成学習アルゴリズムを提案する。実験の結果,CoRMは非順序のサブタスクを持つ長時間水平問題に対して,最先端のRMアルゴリズムよりも拡張性が高いことがわかった。

論文の概要: Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

関連論文リスト