Fugu-MT 論文翻訳(概要): Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

論文の概要: Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

arxiv url: http://arxiv.org/abs/2605.13207v1
Date: Wed, 13 May 2026 08:58:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.930164
Title: Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning
Title（参考訳）: 階層型ゼロショット強化学習のための継承手段の切り替え
Authors: Stefan Stojanovic, Alexandre Proutiere,
Abstract要約: 我々は、強化学習における階層的制御を可能にする後継尺度である切替後継尺度を導入する。後継策の切り替えは,その構造を保ちながら,古典的後継措置から自然に生じることを示す。 FB$-Switchは非階層的ベースラインよりも改善されている。
参考スコア（独自算出の注目度）: 49.24483784910263
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical reinforcement learning can improve generalization by decomposing long-horizon decision-making into simpler subproblems. However, existing approaches often rely on restrictive design choices, such as fixed temporal abstractions or goal-conditioned objectives, which largely confine them to goal-reaching tasks and limit their applicability to general reward functions. In this paper, we introduce switching successor measures, an extension of successor measures that enables hierarchical control in zero-shot reinforcement learning without additional supervision, fixed horizons, or manually designed subgoals. We show that switching successor measures arise naturally from classical successor measures while preserving their underlying structure. Building on this result, we propose FB $π$-Switch, an algorithm that extracts both a high-level subgoal-selection policy and a low-level control policy directly from forward-backward (FB) representations, allowing hierarchical behavior to emerge from a single learned representation. Experiments on both goal-conditioned and general reward-based tasks show that FB $π$-Switch improves over non-hierarchical baselines and matches state-of-the-art hierarchical methods in goal-conditioned settings. These results demonstrate that structured successor representations provide a flexible foundation for hierarchical zero-shot reinforcement learning beyond goal-reaching tasks. Our project website is available at: https://stestokth.github.io/switching-successors/.
Abstract（参考訳）: 階層的強化学習は、長い水平決定をより単純なサブプロブレムに分解することで、一般化を改善することができる。しかしながら、既存のアプローチは、固定時間的抽象や目標条件付き目的など、限定的な設計選択に依存しており、それらは主に目標達成タスクに制限され、一般の報酬関数に適用性を制限する。本稿では,ゼロショット強化学習における階層的制御を,追加の監督や固定地平線,手動設計によるサブゴールを伴わずに実現するための代替手段である切替後継策を導入する。本研究は,古典的後継対策から,その基盤構造を保ちながら,スイッチング後継措置が自然に生じることを示す。この結果に基づいて,FB $π$-Switchを提案する。このアルゴリズムは,高レベルなサブゴール選択ポリシーと低レベルな制御ポリシーの両方をフォワードバック(FB)表現から直接抽出し,階層的な動作を1つの学習表現から得ることができる。 FB$π$-Switchは非階層的ベースラインよりも改善され、ゴール条件付きおよび一般報酬ベースのタスクにおいて最先端の階層的メソッドと一致することを示す。これらの結果は、構造化後継表現が、ゴール獲得タスクを超えて階層的ゼロショット強化学習の柔軟な基盤を提供することを示す。プロジェクトのWebサイトは、https://stestokth.github.io/switching-successors/.com/で公開されている。

論文の概要: Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

関連論文リスト