Fugu-MT 論文翻訳(概要): Lifting Embodied World Models for Planning and Control

論文の概要: Lifting Embodied World Models for Planning and Control

arxiv url: http://arxiv.org/abs/2604.26182v1
Date: Tue, 28 Apr 2026 23:59:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-30 15:59:36.203508
Title: Lifting Embodied World Models for Planning and Control
Title（参考訳）: 計画・制御のためのリフティング・エボディード・ワールドモデル
Authors: Alex N. Wang, Trevor Darrell, Pavel Izmailov, Yutong Bai, Amir Bar,
Abstract要約: 我々は、ハイレベルなアクションを低レベルなジョイントアクションのシーケンスにマッピングする軽量なポリシーを訓練する。我々は、この枠組みを人間的な実施のためにインスタンス化し、ハイレベルなアクション空間を2次元のウェイポイントの小さなセットとして定義する。昇降した世界モデルは,低レベルな関節空間で直接探索するよりもかなり優れていることを示す。
参考スコア（独自算出の注目度）: 59.09016913513998
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: World models of embodied agents predict future observations conditioned on an action taken by the agent. For complex embodiments, action spaces are high-dimensional and difficult to specify: for example, precisely controlling a human agent requires specifying the motion of each joint. This makes the world model hard to control and expensive to plan with as search-based methods like CEM scale poorly with action dimensionality. To address this issue, we train a lightweight policy that maps high-level actions to sequences of low-level joint actions. Composing this policy with the frozen world model produces a lifted world model that predicts a sequence of future observations from a single high-level action. We instantiate this framework for a human-like embodiment, defining the high-level action space as a small set of 2D waypoints annotated on the current observation frame, each specifying a near-term goal position for a leaf joint (pelvis, head, hands). Waypoints are low-dimensional, visually interpretable, and easy to specify manually or to search over. We show that the lifted world model substantially outperforms searching directly in low-level joint space ($3.8\times$ lower mean joint error to the goal pose), while remaining more compute-efficient and generalizing to environments unseen by the policy.
Abstract（参考訳）: エンボディード・エージェントの世界モデルは、エージェントが取るアクションに条件付けされた将来の観測を予測します。複雑な実施形態では、アクション空間は高次元的で特定が難しい: 例えば、人間のエージェントを正確に制御するには、それぞれの関節の動きを指定する必要がある。これにより、世界モデルは制御が難しく、CEMのような検索ベースの手法では動作の寸法が悪くなるため、計画を立てるのにコストがかかる。この問題に対処するため、我々は、ハイレベルなアクションを低レベルなジョイントアクションのシーケンスにマッピングする軽量なポリシーを訓練する。このポリシーを凍結世界モデルと組み合わせることで、単一のハイレベルなアクションから将来の観測の順序を予測する、リフトされた世界モデルが生成される。我々は、この枠組みを人間のような実施形態のためにインスタンス化し、その高レベルな行動空間を、現在の観察フレームにアノテートされた2Dウェイポイントの小さなセットとして定義し、それぞれが葉の関節(骨盤、頭、手)の短期的な目標位置を指定する。ウェイポイントは低次元であり、視覚的に解釈可能であり、手動で指定したり、検索するのも容易である。昇降した世界モデルは,低レベルなジョイント空間(3.8\times$low mean joint error to the goal pose)での探索において,より計算効率を高くし,ポリシーに見当たらない環境への一般化を保ちながら,大幅に上回っていることを示す。

論文の概要: Lifting Embodied World Models for Planning and Control

関連論文リスト