Fugu-MT 論文翻訳(概要): Goal-Conditioned Agents that Learn Everything All at Once

論文の概要: Goal-Conditioned Agents that Learn Everything All at Once

arxiv url: http://arxiv.org/abs/2605.23551v1
Date: Fri, 22 May 2026 12:17:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.338586
Title: Goal-Conditioned Agents that Learn Everything All at Once
Title（参考訳）: すべてを一度に学習するゴールコンディションエージェント
Authors: Michael Matthews, Matthew Jackson, Michael Beukman, Thomas Foster, Alistair Letcher, Scott Fujimoto, Cédric Colas, Jakob Foerster,
Abstract要約: すべてのゴール学習(All-Goals learning)は、各ゴールに関して、各トランジションが政治以外の学習に使用されるもので、エージェントが最大限の情報を抽出することを可能にする。これは、すべてのゴールに対する値とアクションを同時に出力することで、効率的で並列な全ゴール更新を可能にすることで克服できる。ゴール条件のCraftaxでは,このアプローチが他の手法よりも大幅に優れていることを示す。
参考スコア（独自算出の注目度）: 14.217160378270266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information, however it is usually computationally infeasible when done via naive relabelling. This can be overcome by jointly outputting values and actions for every goal at once, allowing for efficient, parallel all-goals updates with a single pass through the network, in a process we call Learning Everything all at Once (LEO). We show that this approach significantly outperforms other methods on goal-conditioned Craftax and is competitive with existing baselines on continuous control environments, while achieving a >250x speed-up compared to all-goals relabelling. We then go on to show that this approach can be made even more powerful by using LEO as a teacher network, rather than a direct actor. We hope that, by unlocking all-goals learning at scale, LEO can serve as a useful tool for RL practitioners in complex environments. We open source our code.
Abstract（参考訳）: 環境を探索する目標条件強化学習エージェントは、経路全体を通して豊富な情報を見ることができ、そのほとんどが、指示された目標に関して、オン・ポリシーの更新のみを行うときに破棄される。すべてのゴール学習(英語版)では、各トランジッションが、すべてのゴールに関して、政治以外の学習に使用されるが、エージェントは最大情報を抽出することができるが、通常、単純レラベリング(英語版)によって行われる場合、計算上は不可能である。これは、一度にすべてのゴールに対して値とアクションを共同で出力することで克服できます。本手法はゴール条件付きCraftaxの他の手法よりも大幅に優れており,既存の連続制御環境のベースラインと競合する一方で,全ゴールリラベリングに比べて250倍のスピードアップを実現していることを示す。次に、直接アクターではなく、教師ネットワークとしてLEOを使用することにより、このアプローチをより強力にすることができることを示す。大規模な全目標学習をアンロックすることで、LEOが複雑な環境でのRL実践者にとって有用なツールになることを期待しています。コードをオープンソースにしています。

論文の概要: Goal-Conditioned Agents that Learn Everything All at Once

関連論文リスト