Fugu-MT 論文翻訳(概要): Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

論文の概要: Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

arxiv url: http://arxiv.org/abs/2603.15857v1
Date: Mon, 16 Mar 2026 19:39:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:06.966923
Title: Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
Title（参考訳）: 正規化潜在ダイナミクス予測は行動基礎モデルのための強力なベースラインである
Authors: Pranaya Jajoo, Harshit Sikchi, Siddhant Agarwal, Amy Zhang, Scott Niekum, Martha White,
Abstract要約: 行動基礎モデル(BFM)は、未知の報酬やタスクに適応する能力を持つエージェントを生成する。これらの手法は、既存の状態特徴の範囲内にある報酬関数に対して、ほぼ最適にポリシーを作成できるのみである。本稿では,ゼロショットRLに対して,最先端の複雑な表現学習手法に適合または超越可能なRLDP(Regularized Latent Dynamics Prediction)を提案する。
参考スコア（独自算出の注目度）: 35.088440282359024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Behavioral Foundation Models (BFMs) produce agents with the capability to adapt to any unknown reward or task. These methods, however, are only able to produce near-optimal policies for the reward functions that are in the span of some pre-existing state features, making the choice of state features crucial to the expressivity of the BFM. As a result, BFMs are trained using a variety of complex objectives and require sufficient dataset coverage, to train task-useful spanning features. In this work, we examine the question: are these complex representation learning objectives necessary for zero-shot RL? Specifically, we revisit the objective of self-supervised next-state prediction in latent space for state feature learning, but observe that such an objective alone is prone to increasing state-feature similarity, and subsequently reducing span. We propose an approach, Regularized Latent Dynamics Prediction (RLDP), that adds a simple orthogonality regularization to maintain feature diversity and can match or surpass state-of-the-art complex representation learning methods for zero-shot RL. Furthermore, we empirically show that prior approaches perform poorly in low-coverage scenarios where RLDP still succeeds.
Abstract（参考訳）: 行動基礎モデル(BFM)は、未知の報酬やタスクに適応する能力を持つエージェントを生成する。しかしながら、これらの手法は、既存の状態特徴の範囲内にある報酬関数に対して、ほぼ最適にしかポリシーを作成できないため、状態特徴の選択はBFMの表現性に不可欠である。結果として、BFMは様々な複雑な目的を用いて訓練され、タスク用途のスパンニング機能をトレーニングするために十分なデータセットカバレッジを必要とします。本稿では, ゼロショットRLに必要な複雑な表現学習目標について検討する。具体的には、状態特徴学習のための潜在空間における自己監督型次状態予測の目的を再考するが、そのような目的だけでは、状態-機能的類似度を増大させ、その後、スパンを減少させる傾向があることを観察する。特徴多様性を維持するための単純な直交正規化を追加し、ゼロショットRLのための最先端の複雑な表現学習手法に適合または超越できる手法である正規化潜在ダイナミクス予測(RLDP)を提案する。さらに, RLDP がまだ成功している低被覆シナリオにおいて, 先行手法が不十分であることを示す。

論文の概要: Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

関連論文リスト