Fugu-MT 論文翻訳(概要): Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization

論文の概要: Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization

arxiv url: http://arxiv.org/abs/2605.06145v1
Date: Thu, 07 May 2026 12:40:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.781793
Title: Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization
Title（参考訳）: ゴールコンディション付きRLと制御最大化による教師なしスキル学習の統一
Authors: Alireza Modirshanechi, Benjamin Eysenbach, Peter Dayan, Eric Schulz,
Abstract要約: 目標条件強化学習(GCRL)における教師なし事前訓練による経験的進歩特に、相互情報スキル学習(MISL)と呼ばれる影響力のある手法のクラスは、後に下流のゴール獲得に使用できる行動的に多様なスキルを発見する。 MISLで学んだスキルが目標達成をサポートするのは、理論的なミステリーである。
参考スコア（独自算出の注目度）: 41.30196546270599
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unsupervised pretraining has driven empirical advances in goal-conditioned reinforcement learning (GCRL), but its theoretical foundations remain poorly understood. In particular, an influential class of methods, mutual information skill learning (MISL), discovers behaviorally diverse skills that can later be used for downstream goal-reaching. However, it remains a theoretical mystery why skills learned through MISL should support goal-reaching. A subtle challenge is that both GCRL and MISL are umbrella terms: different GCRL tasks use distinct criteria for measuring goal-reaching performance, while different MISL methods optimize distinct notions of behavioral diversity. We address this challenge and unify GCRL and MISL as instances of control maximization. We identify three canonical GCRL formulations and prove that they are fundamentally inequivalent: they can induce incompatible optimal policies even in the same environment. Nevertheless, they all share a common interpretation: a well-performing goal-conditioned policy is one whose future trajectory is highly sensitive to the commanded goal, with the precise notion of sensitivity determined by the GCRL formulation. Noting that MISL objectives can be understood as measures of skill-sensitivity akin to goal-sensitivity, we show that MISL objectives are bounded by formulation-specific downstream goal-sensitivities. These bounds establish a precise correspondence between MISL methods and downstream GCRL tasks: for every GCRL formulation, there exists a matching MISL objective for which more diverse skills afford greater downstream goal sensitivity. Our results thus lay a theoretical foundation for RL pretraining and have important practical implications, such as suggesting which pretraining objectives to use when a user cares about a specific class of downstream tasks.
Abstract（参考訳）: 教師なし事前訓練は、ゴール条件強化学習(GCRL)において経験的な進歩をもたらしたが、その理論的基礎は未だよく分かっていない。特に、相互情報スキル学習(MISL)と呼ばれる影響力のある手法のクラスは、後に下流のゴール獲得に使用できる行動的に多様なスキルを発見する。しかし、MISLで学んだスキルが目標達成をサポートするのは、理論的なミステリーである。微妙な課題は、GCRLとMISLの両方が包括的用語であることである。異なるGCRLタスクは目標達成性能の異なる基準を使用し、異なるMISLメソッドは行動多様性の異なる概念を最適化する。この課題に対処し、制御最大化の事例としてGCRLとMISLを統一する。 3つの標準GCRLの定式化を同定し、それらが基本的に等価でないことを証明する。目標条件付きポリシーは、将来の軌道が指示された目標に非常に敏感であり、GCRLの定式化によって決定される感度の正確な概念である。また,MISLの目的が,目標感度に類似したスキルセンシティブな尺度として理解できることから,MISLの目的が定式化固有の下流目標感性によって境界づけられていることが示唆された。これらの境界は、MISL法と下流GCRLタスクの正確な対応を確立し、全てのGCRL定式化に対して、より多様なスキルがより下流の目標感度を高めるMISL目標が存在する。以上の結果から,ユーザが特定の下流タスクに気を配る場合に,どの事前学習目標を使用するかを示唆するなど,RL事前学習の理論的基盤を築き,重要な実践的意味を持つことがわかった。

論文の概要: Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization

関連論文リスト